Set up libraries and functions

Analysis of RNA-seq data from a neurexin-deficient worm

Introduction

The RNA-seq data in this project originates from sequencing done at the Genomics Centre of QMUL. The samples came from Prof. Filipe Cabreiro's lab (then UCL, now Imperial) from normal and neurexin-deficient worms - the work was mostly carried out by Peter Cooke (then Phd student). Other members in Filipe's lab had helped out at the time with supervision and RNA extraction. The sequencing centre did not achieve good ribosomal depletion as it transpired that they did not have the right kit for C.elegans. Hence, only medium/high expressed genes are seen in these data and ribosomal RNA reads must be removed before analysing the data further.

Aine did a differential gene expression analysis based on counting reads using HTseq count and doing differential expresion with DESEq2 and limma. However, if I remember well, she missed out ~ half the reads due to a problem in one of her shell scripts (when pooling together the lanes, I think she pulled only two out of the four available).

It is now recognised that gene expression analysis should be based on aggregating the results of transcript-based analysis. Below is a list of links that describe how this should be done:

The data

The data comprise 64 FASTQ files in total. There are 8 samples, each split into 4 lanes and each being sequenced in paired-end mode, hence there are: 8 x 4 x 2= 64 files.All data is from reversely-stranded libraries i.e. read 2 is in the same direction as the transcript on the genome. - 4 normal samples, names starting with "N" (biological replicates) - 4 neurexin-deficient samples, names starting with "SG" (biological replicates).

All samples originated from 1-day old adult worms.

Below is an outline of the steps to analyse the RNA-seq data from Aine's project, starting with raw fastq data and ending with mapped reads to the C. elegans transcriptome using kallisto.

All code was run on the departmental server thoth. Where the variable $PROJECT is used below it refers to the following directory: /d/in16/u/ubcg71a/research/filipe/ .

Data quality analysis with FastQC/multiQC

Initial fastqc data quality analysis on raw data was carried out by both Aine and repeated by Krzysztof at some point. Results can be found here: /d/in13/u/kszkop01/worm_neurexin/fastqc/

Had I run it myself I would have used my bash script run_fastqc.sh for running fastQC on multiple files:

#!/bin/bash
# Runs FASTQC on all fastq.gz files found in given directory
# Run as:
# ./run_fastqc dir_of_fastq_files

timestamp=`date "+%Y%m%d-%H%M%S"`
logfile="run_$timestamp.log"
exec > $logfile 2>&1  #all output will be logged to logfile

FASTQC_DIR="/s/software/fastqc/v0.11.8/FastQC/fastqc"

echo "Running FASTQC using executable: $FASTQC_DIR"

for fastq_file in $1/*.fastq.gz;
do
  echo "Fastq file: $fastq_file"
  $FASTQC_DIR $fastq_file -o .
done

I summarised K's FASTQC results using multiQC:

module use -s /s/mm/modules
module load python/v2
multiqc /d/in13/u/kszkop01/worm_neurexin/fastqc/

Results are under: $PROJECT/multiqc_on_raw/

All samples have sequences of a single length (76 bp).

Diagnostic plots from FASTQC summarised with multiQC (in all these plots, files with R1 are coloured red, files with R2 are blue):

Alt text Alt text Alt text Alt text

Alt text

Alt text

Alt text

Alt text

Observations from initial quality analysis

  • Quality scores are generally very high even for the last few positions of the read; in general R1 (shown in red in the plots above) slightly better than R2 (shown in blue)
  • There are high levels of duplication (most likely rRNA)
  • There are some Ns mostly at the beginning of the reads
  • GC content shows strange distribution with differences between R1 and R2 (I've seen this before in datasets; maybe bias from the bias due to adapters still present?). GC of C. elegans is supposed to have mean base composition of 36% GC (with the rRNA cistrons being closer to 51%; ref: PMID: 4858229). Our plots show higher GC content in all cases but there is a peak where rRNA should be.
  • There is evidence of adapters present in the 3' end

I saved the statistics table from multiqc and read it in here to summarise:

t.stats <- read.table(file="multiqc/stats.txt", header=TRUE, sep="\t")
#use tidyr's separate to split the first column
t.stats <- separate(t.stats, Sample.Name, into=c("sample", "sample_num", "lane", "read", "nothing"), sep="_", remove=FALSE, convert=FALSE)
summary(t.stats$M.Seqs)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.100   4.400   4.750   4.791   5.125   5.900
#print statistics per group
t.stats %>% group_by(sample) %>% summarize(mean=mean(M.Seqs))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 8 x 2
##   sample  mean
##   <chr>  <dbl>
## 1 N1      4.42
## 2 N2      5   
## 3 N3      4.45
## 4 N4      4.42
## 5 SG1     5.25
## 6 SG2     4.22
## 7 SG3     4.9 
## 8 SG4     5.65
t.stats %>% group_by(lane) %>% summarize(mean=mean(M.Seqs))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 4 x 2
##   lane   mean
##   <chr> <dbl>
## 1 L001   4.76
## 2 L002   4.66
## 3 L003   5   
## 4 L004   4.74
t.stats %>% group_by(sample) %>% summarize(mean=mean(X..Dups))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 8 x 2
##   sample  mean
##   <chr>  <dbl>
## 1 N1      82.4
## 2 N2      82.6
## 3 N3      83.4
## 4 N4      79.9
## 5 SG1     81.6
## 6 SG2     80.4
## 7 SG3     81.4
## 8 SG4     81.4
t.stats %>% group_by(lane) %>% summarize(mean=mean(X..Dups))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 4 x 2
##   lane   mean
##   <chr> <dbl>
## 1 L001   81.5
## 2 L002   81.4
## 3 L003   82.0
## 4 L004   81.6
t.stats %>% group_by(sample) %>% summarize(mean=mean(X..GC))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 8 x 2
##   sample  mean
##   <chr>  <dbl>
## 1 N1      49  
## 2 N2      48  
## 3 N3      49  
## 4 N4      48  
## 5 SG1     48.5
## 6 SG2     48  
## 7 SG3     49  
## 8 SG4     49
t.stats %>% group_by(lane) %>% summarize(mean=mean(X..GC))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 4 x 2
##   lane   mean
##   <chr> <dbl>
## 1 L001   48.6
## 2 L002   48.6
## 3 L003   48.6
## 4 L004   48.6

Read pre-processing with fastp

I trimmed (adapters and poly(A) tails) and quality-filtered the reads using the program fastp :

Raw fastq data is under: $PROJECT/data/Data/170519_NS500784_0235_AHT2JYBGX2/Demulti/Data/Intensities/BaseCalls/GC-IN-6725/

To remove poly(A) tails and adapters, I ran the script under $PROJECT/fastp:

nohup ./run_fastp.sh ../data/Data/170519_NS500784_0235_AHT2JYBGX2/Demulti/Data/Intensities/BaseCalls/GC-IN-6725/ >& run_fastp.out &

Below is the run_fastp.sh script:

#!/bin/bash
# Runs fastp in PE mode for all .fastq.gz files in a directory
# Run as:
# ./run_fastp.sh directory_of_samples 

timestamp=`date "+%Y%m%d-%H%M%S"`
logfile="run_$timestamp.log"
exec > $logfile 2>&1  #all output will be logged to logfile

FASTP_EXEC="/d/in7/s/fastp/fastp"
DIR=$1
ADAPTER_FILE="./adapters_all.fa"

echo "Running fastp using executable: $FASTP_EXEC"

for file in `ls $DIR/*_R1_*.fastq.gz`;
do
  sample=${file/$DIR\//}
  sample=${sample/_R1_001.fastq.gz/}
  echo "Sample= $sample"
  $FASTP_EXEC -i  "$DIR/$sample"_R1_001.fastq.gz  \
              -I  "$DIR/$sample"_R2_001.fastq.gz \
         -o "$sample"_R1_001_trimmed.fastq.gz \
         -O "$sample"_R2_001_trimmed.fastq.gz \
         --adapter_fasta $ADAPTER_FILE \
         -l 30 --trim_poly_x \
         -h "$sample"_fastp.html -j "$sample"_fastp.json
done

NOTE: fastp by default uses the following quality filtering options:

  • 40% of bases are allowed to be at phred quality Q<15 (unqualified)
  • reads with at least 5 Ns are discarded
  • average quality of the read is not checked by default

Data quality re-analysis with FastQC/multiQC following fastp

Re-ran fastQC with the files in the fastp directory: ./run_fastp.sh ../fastp Results in the directory: $PROJECT/fastqc_after_fastp

Then re-ran multiqc and moved the .html file locally to save the plots shown below:

Alt text

Alt text

Alt text

Alt text

Alt text

Alt text

Alt text

Alt text

Alt text

Alt text

Alt text

Alt text

Results are under $PROJECT/fastqc_after_fastp/multiqc

Observations from re-analysis of quality

  • There are still Ns mostly at the beginning of the reads but these are only in small percentage of reads and only in the first 1-3 bases so will leave them in.
  • GC content shows the same strange distribution hence this has nothing to do with adapters and either to do with biases in the library creation or, more likely, the rRNA that is still present.
  • Adapters have been successfully removed.

Merging data from the 4 lanes to have one file per sample

Seeing that there were no strong lane effects, I merged all 4 lanes for each sample (this could have been done following mapping but there really was no evidence here to suggest a lane effect). The new files are in the $PROJECT/fastp_merged directory and the script run_fastq_merger.sh is given below:

#!/bin/bash
# Merges all fastq files for the same sample and same read orientation into single fastqc 
# Run as:
# ./run_fastq_merger.sh directory_of_fastq_files sample_name sample_number

timestamp=`date "+%Y%m%d-%H%M%S"`
logfile="run_$timestamp.log"
exec > $logfile 2>&1  #all output will be logged to logfile

MERGED_DIR="/d/in16/u/ubcg71a/research/filipe/fastp_merged/"

cd $1
sample=$2
samplenum=$3
echo $(pwd)
echo Merging for sample $sample 
cat "$sample"*"$samplenum"*R1_001*fastq.gz > $MERGED_DIR/"$sample"_"$samplenum"_R1.fastq.gz
cat "$sample"*"$samplenum"*R2_001*fastq.gz > $MERGED_DIR/"$sample"_"$samplenum"_R2.fastq.gz
./run_fastq_merger.sh ../fastp/ N1 S5
./run_fastq_merger.sh ../fastp/ N2 S6
./run_fastq_merger.sh ../fastp/ N3 S7
./run_fastq_merger.sh ../fastp/ N4 S8
./run_fastq_merger.sh ../fastp/ SG1 S1
./run_fastq_merger.sh ../fastp/ SG2 S2
./run_fastq_merger.sh ../fastp/ SG3 S3
./run_fastq_merger.sh ../fastp/ SG4 S4

Mapping processed reads to the genome with STAR

Before carrying out the analysis with kallisto/sleuth, we will map the reads using STAR and carry out some exploratory analysis of the data.

Aine's work showed a huge number of reads mapping to rRNA.Here, we will allow them to map and exclude them later. During the kallisto run, we will remove them from the transcriptome before mapping to save hassle.

First, we need to create an index of the genome with STAR:

module load star/v2.7

STAR --runThreadN 8 --runMode genomeGenerate --genomeDir . --genomeFastaFiles ../Caenorhabditis_elegans.WBcel235.dna.toplevel.fa --sjdbGTFfile ../Caenorhabditis_elegans.WBcel235.99.gtf --genomeSAindexNbases 12
#(note that parameter genomeSAindexNbases must be scaled down to 12 from the default 14 because the genome
#is smaller than the human genome)

#The genome index is currently under: /d/in16/u/ubcg71a/research/genomes/c.elegans/ensembl_99/STAR_INDEX/

#Once the genome index is created we can map the files using a bash script:
nohup ./run_star.sh ../fastp/ list_files > & run_star.out &
#where list_files contains a list of all samples
#N1_S5
#N2_S6
#N4_S8
#N3_S7
#SG1_S1
#SG2_S2
#SG3_S3
#SG4_S4

The script run_star.sh is given below:

#!/bin/bash

# Runs STAR mapping on all samples in a given directory 
# Samples must end in .fastq.gz and are given as a list in a file (second argument)

# Run as:
# run_star.sh directory_of_fastq_files list_of_samples 

timestamp=`date "+%Y%m%d-%H%M%S"`
logfile="run_$timestamp.log"
exec > $logfile 2>&1  #all output will be logged to logfile

dir=$1
shift

#set location of executables
STAR_EXEC="/s/software/STAR/bin/Linux_x86_64/STAR"

numProc=5   #number of processors/threads to be used
genomeIndex="/d/in16/u/ubcg71a/research/genomes/c.elegans/ensembl_99/STAR_INDEX/" #directory for genome index files 

for sample in `cat $1`;
do
   echo "Running STAR on sample $sample (paired-end reads) ..."

   inputFile1="$dir$sample"_R1.fastq.gz
   inputFile2="$dir$sample"_R2.fastq.gz

   STAR --runThreadN $numProc \
        --runMode alignReads \
        --outSAMtype BAM SortedByCoordinate \
        --readFilesCommand zcat \
        --genomeDir $genomeIndex \
        --outFileNamePrefix "$sample"_ \
        --readFilesIn $inputFile1 $inputFile2 \
        --outReadsUnmapped Fastx \
        --quantMode GeneCounts

done

echo "All done!"

STAR produces separate files with the unmapped reads (which we can use, for example, to map to circular RNA transcripts later) and it also produces gene counts. Ideally one should rely on transcript counts for differential gene expression; here we will use gene counts for the exploratory part of the analysis only.

I summarised the STAR output by running multiqc again. Results are under $PROJECT/star_mapping/multiqc/

Alt text Alt text Alt text

Exploratory data analysis with STAR gene counts

I start by defining my own version of DESeq2's plotPCA() function so that I can define which PCs I will be plotting

#define my own pcaplot to allow other components besides PC1 and PC2 to be plotted
#the function itself is copied from DESEq2
myPlotPCA <- function (x, intgroup = "condition", pc1 = 1, pc2 =2, ntop = 500, returnData = FALSE) 
{
  rv <- genefilter::rowVars(assay(x)) #if using with SummarizedExperiment objects
  select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, 
                                                     length(rv)))]
  pca <- prcomp(t(assay(x)[select, ]))
  percentVar <- pca$sdev^2/sum(pca$sdev^2)
  if (!all(intgroup %in% names(colData(x)))) {
     stop("the argument 'intgroup' should specify columns of colData(dds)")
  }
  intgroup.df <- as.data.frame(colData(x)[, intgroup, drop = FALSE])
  group <- if (length(intgroup) > 1) {
            factor(apply(intgroup.df, 1, paste, collapse = " : "))
        }
        else {
            colData(x)[[intgroup]]
        }
  
  d <- data.frame(PC1 = pca$x[, pc1], PC2 = pca$x[, pc2], group = group, 
                  intgroup.df, names = colnames(x))
  if (returnData) {
    attr(d, "percentVar") <- percentVar[c(pc1,pc2)]
    return(d)
  }
  ggplot(data = d, 
         aes_string(x = "PC1", y = "PC2",  color = "group"))+
    geom_point(size=3) +
    xlab(paste0("PC",pc1,": ", round(percentVar[pc1] * 100), "% variance")) +
    ylab(paste0("PC",pc2,": ", round(percentVar[pc2] * 100), "% variance"))
}

Using rtracklayer and the gtf file to obtain gene names

gtf <- rtracklayer::import('genomes/Caenorhabditis_elegans.WBcel235.99.gtf')
gtf_df=as.data.frame(gtf)
t2g <- data.frame(target_id = gtf_df$transcript_id,
                  ens_gene = gtf_df$gene_id,
                  ext_gene = gtf_df$gene_name)
t2g <- t2g %>% drop_na()
t2g <- dplyr::distinct(t2g)

The gene counts are in the .tab files of STAR. I moved those locally and use code from: http://biostars.org/p/241602 to read in counts so they can be used with DESeq2.

library(ggplot2)
library(DESeq2)
## Loading required package: SummarizedExperiment
## Loading required package: DelayedArray
## Loading required package: matrixStats
## 
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:Biobase':
## 
##     anyMissing, rowMedians
## The following object is masked from 'package:dplyr':
## 
##     count
## Loading required package: BiocParallel
## 
## Attaching package: 'DelayedArray'
## The following objects are masked from 'package:matrixStats':
## 
##     colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
## The following object is masked from 'package:clusterProfiler':
## 
##     simplify
## The following objects are masked from 'package:base':
## 
##     aperm, apply, rowsum
#First, read in the gene counts from STAR
number<- 4 #reverse-stranded library so counts should be in column 4
ff <- list.files( path = "./star_mapping", pattern = "*ReadsPerGene.out.tab$", full.names = TRUE )
counts.files <- lapply( ff, read.table, skip = 4 )
starcounts <- as.data.frame( sapply( counts.files, function(x) x[ , number ] ) )
ff <- gsub( "[_]ReadsPerGene[.]out[.]tab", "", ff )
ff <- gsub( "[.]/star_mapping/", "", ff )
colnames(starcounts) <- ff
row.names(starcounts) <- counts.files[[1]]$V1
rm(counts.files)

coldata <- read.table("metadata.txt", header=T)
rownames(coldata) <- coldata[,1]
coldata
##        sample condition
## N1_S5   N1_S5   control
## N2_S6   N2_S6   control
## N3_S7   N3_S7   control
## N4_S8   N4_S8   control
## SG1_S1 SG1_S1  neurexin
## SG2_S2 SG2_S2  neurexin
## SG3_S3 SG3_S3  neurexin
## SG4_S4 SG4_S4  neurexin
#Check row and column names match in the same order
rownames(coldata)  == colnames(starcounts)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
dds = DESeqDataSetFromMatrix(countData = starcounts, 
                             colData = coldata, 
                             design = ~ condition )
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
dds
## class: DESeqDataSet 
## dim: 46904 8 
## metadata(1): version
## assays(1): counts
## rownames(46904): WBGene00197333 WBGene00198386 ... WBGene00010967
##   WBGene00014473
## rowData names(0):
## colnames(8): N1_S5 N2_S6 ... SG3_S3 SG4_S4
## colData names(2): sample condition
#pre-filter to remove genes with too few counts
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds
## class: DESeqDataSet 
## dim: 14163 8 
## metadata(1): version
## assays(1): counts
## rownames(14163): WBGene00015153 WBGene00002061 ... WBGene00010966
##   WBGene00010967
## rowData names(0):
## colnames(8): N1_S5 N2_S6 ... SG3_S3 SG4_S4
## colData names(2): sample condition
#check quickly the distribution of the other counts
quantile(rowSums(counts(dds)))
##       0%      25%      50%      75%     100% 
##       10       51      364     2168 83054802
#and in more detail
quantile(rowSums(counts(dds)), probs=seq(0.9,1,0.005))
##         90%       90.5%         91%       91.5%         92%       92.5% 
##     5149.00     5294.22     5505.68     5756.92     6024.16     6250.85 
##         93%       93.5%         94%       94.5%         95%       95.5% 
##     6603.22     6954.88     7311.76     7756.54     8130.50     8767.13 
##         96%       96.5%         97%       97.5%         98%       98.5% 
##     9355.20    10231.64    11092.54    12633.10    14999.88    18921.49 
##         99%       99.5%        100% 
##    26026.46    38536.55 83054802.00
vsd <- vst(dds, blind=TRUE)

#note that by default only the top 500 genes in terms of variance are used for this PCA plot
plotPCA(vsd, intgroup=c("condition"))

This initial PCA looks promising with good separation of the samples. However, the rRNAs remain in the counts as shown above and they could be skewing the results.

We need to remove all counts corresponding to rRNA transcript_biotype reads (there are 23) - the genome/transcriptome files are from Ensembl and can be found in this directory: /d/in16/u/ubcg71a/research/genomes/c.elegans

 zcat Caenorhabditis_elegans.WBcel235.cdna_and_ncrna.fa.gz | grep biotype:rRNA > list_of_biotype_rRNA_transcripts.txt
 
#keep gene names only
awk '{print $4}' list_of_biotype_rRNA_transcripts.txt | sed 's/gene://' | sed 's/\..//' > list_biotype_rRNA_genes.txt

We will now move this list of rRNA genes locally and use it to filter out the unwanted counts.

library("pheatmap")

#clean up
rm(ff, coldata, starcounts, dds, vsd)

number<- 4 #reverse-stranded library so counts should be in column 4
ff <- list.files( path = "./star_mapping", pattern = "*ReadsPerGene.out.tab$", full.names = TRUE )
counts.files <- lapply( ff, read.table, skip = 4 )
starcounts <- as.data.frame( sapply( counts.files, function(x) x[ , number ] ) )
ff <- gsub( "[_]ReadsPerGene[.]out[.]tab", "", ff )
ff <- gsub( "[.]/star_mapping/", "", ff )
colnames(starcounts) <- ff
row.names(starcounts) <- counts.files[[1]]$V1
rm(counts.files)

coldata <- read.table("metadata.txt", header=T)
rownames(coldata) <- coldata[,1]
coldata
##        sample condition
## N1_S5   N1_S5   control
## N2_S6   N2_S6   control
## N3_S7   N3_S7   control
## N4_S8   N4_S8   control
## SG1_S1 SG1_S1  neurexin
## SG2_S2 SG2_S2  neurexin
## SG3_S3 SG3_S3  neurexin
## SG4_S4 SG4_S4  neurexin
#Check row and column names match in the same order
rownames(coldata)  == colnames(starcounts)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#now remove the rRNAs from the counts table
rRNAgenes <- read.table(file="list_biotype_rRNA_genes.txt", header=F)
rRNAgenes
##                V1
## 1  WBGene00004677
## 2  WBGene00004512
## 3  WBGene00004513
## 4  WBGene00004567
## 5  WBGene00004622
## 6  WBGene00014454
## 7  WBGene00014472
## 8  WBGene00014621
## 9  WBGene00077465
## 10 WBGene00077466
## 11 WBGene00077467
## 12 WBGene00077468
## 13 WBGene00077469
## 14 WBGene00077470
## 15 WBGene00077471
## 16 WBGene00077472
## 17 WBGene00077473
## 18 WBGene00077474
## 19 WBGene00077475
## 20 WBGene00077476
## 21 WBGene00077477
## 22 WBGene00189966
## 23 WBGene00235197
starcounts.norRNA <- starcounts[!row.names(starcounts)%in%rRNAgenes[,1],]
dim(starcounts)
## [1] 46904     8
dim(starcounts.norRNA)
## [1] 46881     8
dim(rRNAgenes)
## [1] 23  1
rm(starcounts) #clean up
dds = DESeqDataSetFromMatrix(countData = starcounts.norRNA, 
                             colData = coldata, 
                             design = ~ condition )
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
dds
## class: DESeqDataSet 
## dim: 46881 8 
## metadata(1): version
## assays(1): counts
## rownames(46881): WBGene00197333 WBGene00198386 ... WBGene00010967
##   WBGene00014473
## rowData names(0):
## colnames(8): N1_S5 N2_S6 ... SG3_S3 SG4_S4
## colData names(2): sample condition
#pre-filter to remove genes with too few counts
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds
## class: DESeqDataSet 
## dim: 14156 8 
## metadata(1): version
## assays(1): counts
## rownames(14156): WBGene00015153 WBGene00002061 ... WBGene00010966
##   WBGene00010967
## rowData names(0):
## colnames(8): N1_S5 N2_S6 ... SG3_S3 SG4_S4
## colData names(2): sample condition
#check quickly the distribution of the other counts
quantile(rowSums(counts(dds)))
##        0%       25%       50%       75%      100% 
##     10.00     51.00    363.00   2162.25 240054.00
#and in more detail
quantile(rowSums(counts(dds)), probs=seq(0.9,1,0.005))
##        90%      90.5%        91%      91.5%        92%      92.5%        93% 
##   5137.500   5279.275   5485.100   5740.000   6012.600   6230.750   6569.650 
##      93.5%        94%      94.5%        95%      95.5%        96%      96.5% 
##   6916.775   7291.800   7736.800   8108.000   8734.175   9302.200  10166.250 
##        97%      97.5%        98%      98.5%        99%      99.5%       100% 
##  11065.100  12575.500  14494.600  18757.400  25663.900  37461.975 240054.000
#show the genes that still have very large counts
i<- rowSums(counts(dds)) >= 100000
assay(dds)[i,]
##                N1_S5 N2_S6 N3_S7 N4_S8 SG1_S1 SG2_S2 SG3_S3 SG4_S4
## WBGene00000915 13312 12611 10603 15602  12740  10132  12254  14820
## WBGene00006929 18386 19388 18132 20877  18804  15398  18484  23763
## WBGene00006926 27452 26867 26905 28783  20753  17046  19889  25582
## WBGene00006930 29295 29066 28874 31287  30451  24642  29171  37268
## WBGene00001167 11896 10835 10303 14872  18855  12461  18973  21002
## WBGene00001168 21195 20607 20916 26436  23787  17016  25938  33677
#Check correlation between raw gene expression vectors for all samples (without log transformation)
pairs(assay(dds))

#and following log transformation
pairs(log(assay(dds)+0.5)) #add 0.5 as pseudocount

cor.mat <- cor(assay(dds))
cor.mat
##            N1_S5     N2_S6     N3_S7     N4_S8    SG1_S1    SG2_S2    SG3_S3
## N1_S5  1.0000000 0.9876924 0.9951246 0.9802646 0.9524706 0.9692726 0.9512352
## N2_S6  0.9876924 1.0000000 0.9900583 0.9869876 0.9472874 0.9698440 0.9392737
## N3_S7  0.9951246 0.9900583 1.0000000 0.9796024 0.9443960 0.9618507 0.9429706
## N4_S8  0.9802646 0.9869876 0.9796024 1.0000000 0.9705380 0.9718217 0.9649730
## SG1_S1 0.9524706 0.9472874 0.9443960 0.9705380 1.0000000 0.9881489 0.9968161
## SG2_S2 0.9692726 0.9698440 0.9618507 0.9718217 0.9881489 1.0000000 0.9817127
## SG3_S3 0.9512352 0.9392737 0.9429706 0.9649730 0.9968161 0.9817127 1.0000000
## SG4_S4 0.9579343 0.9485628 0.9532682 0.9725760 0.9954057 0.9829288 0.9972798
##           SG4_S4
## N1_S5  0.9579343
## N2_S6  0.9485628
## N3_S7  0.9532682
## N4_S8  0.9725760
## SG1_S1 0.9954057
## SG2_S2 0.9829288
## SG3_S3 0.9972798
## SG4_S4 1.0000000
pheatmap(cor.mat, legend=T)

#This all looks good - Ns correlate better with each other than with SGs and vice versa.

#Transform the counts 
vsd <- vst(dds, blind=TRUE)
plotPCA(vsd, intgroup=c("condition"))

#As expected the PCA is not changed much because we only removed a few genes

#try rlog
rld <- rlog(dds, blind = TRUE)
plotPCA(rld, intgroup=c("condition"))

#we can improve the plot a bit by highlighting the replicate number in different shape
pcaData <- plotPCA(vsd, intgroup=c("condition", "sample"), returnData=TRUE)
pcaData$replicate <- sub(pattern= ".*(.)_.*", 
                         replacement="\\1", 
                         perl=TRUE, fixed=FALSE, 
                         x=pcaData$sample)
pcaData
##               PC1        PC2           group condition sample   name replicate
## N1_S5  -10.515292  3.9809806   control:N1_S5   control  N1_S5  N1_S5         1
## N2_S6  -11.343047 -4.3937220   control:N2_S6   control  N2_S6  N2_S6         2
## N3_S7  -10.735343  3.8215511   control:N3_S7   control  N3_S7  N3_S7         3
## N4_S8   -8.883619 -3.2672766   control:N4_S8   control  N4_S8  N4_S8         4
## SG1_S1  10.795760 -0.7681986 neurexin:SG1_S1  neurexin SG1_S1 SG1_S1         1
## SG2_S2   8.886207 -2.7062952 neurexin:SG2_S2  neurexin SG2_S2 SG2_S2         2
## SG3_S3  11.160175  1.7500821 neurexin:SG3_S3  neurexin SG3_S3 SG3_S3         3
## SG4_S4  10.635157  1.5828787 neurexin:SG4_S4  neurexin SG4_S4 SG4_S4         4
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=condition, shape=replicate)) +
    geom_point(size=3) +
    xlab(paste0("PC1: ",percentVar[1],"% variance")) +
    ylab(paste0("PC2: ",percentVar[2],"% variance")) +
    coord_fixed() +
    ggtitle(label="STAR counts, transformed with vsd")

pcaData <- myPlotPCA(vsd, pc1= 2, pc2= 3, intgroup=c("condition", "sample"), returnData=TRUE)
pcaData$replicate <- sub(pattern= ".*(.)_.*", 
                         replacement="\\1", 
                         perl=TRUE, fixed=FALSE, 
                         x=pcaData$sample)
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=condition, shape=replicate)) +
    geom_point(size=3) +
    xlab(paste0("PC2: ",percentVar[1],"% variance")) +
    ylab(paste0("PC3: ",percentVar[2],"% variance")) +
    coord_fixed()

#Check correlation between transformed gene expression vectors for all samples
pairs(assay(vsd))

cor.mat <- cor(assay(vsd))
cor.mat
##            N1_S5     N2_S6     N3_S7     N4_S8    SG1_S1    SG2_S2    SG3_S3
## N1_S5  1.0000000 0.9897970 0.9926442 0.9895144 0.9778167 0.9812096 0.9779625
## N2_S6  0.9897970 1.0000000 0.9904951 0.9911871 0.9752235 0.9819031 0.9726985
## N3_S7  0.9926442 0.9904951 1.0000000 0.9884828 0.9758348 0.9803706 0.9760524
## N4_S8  0.9895144 0.9911871 0.9884828 1.0000000 0.9836709 0.9840315 0.9817861
## SG1_S1 0.9778167 0.9752235 0.9758348 0.9836709 1.0000000 0.9919673 0.9944142
## SG2_S2 0.9812096 0.9819031 0.9803706 0.9840315 0.9919673 1.0000000 0.9898898
## SG3_S3 0.9779625 0.9726985 0.9760524 0.9817861 0.9944142 0.9898898 1.0000000
## SG4_S4 0.9789728 0.9741121 0.9774286 0.9830008 0.9947647 0.9907127 0.9951929
##           SG4_S4
## N1_S5  0.9789728
## N2_S6  0.9741121
## N3_S7  0.9774286
## N4_S8  0.9830008
## SG1_S1 0.9947647
## SG2_S2 0.9907127
## SG3_S3 0.9951929
## SG4_S4 1.0000000
pheatmap(cor.mat, legend=T)

Next, we carry out differential gene expression based on the gene counts from STAR mapping. Note that this approach is now pretty much deprecated and replaced by transcript-based analysis summarised at the gene level so we won't really analyse the results here but add them here for completion.

#BiocManager::install("apeglm")
library(apeglm) #to use in the lfcShrink function

dds <- DESeq(dds) #estimates size factors and dispersions and fits the model
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
res <- results(dds)
res
## log2 fold change (MLE): condition neurexin vs control 
## Wald test p-value: condition neurexin vs control 
## DataFrame with 14156 rows and 6 columns
##                        baseMean      log2FoldChange             lfcSE
##                       <numeric>           <numeric>         <numeric>
## WBGene00015153 1.40026331133511    1.23369435011502  1.25556206618441
## WBGene00002061 1023.39358270591 -0.0561810129305647 0.111894106778005
## WBGene00001177 236.878096958743   0.645083433822109 0.130496798969703
## WBGene00021412 24.8706101313319   0.267994653237935 0.287508181787133
## WBGene00018976 6.03204079332923  0.0830540521723233 0.559216117300581
## ...                         ...                 ...               ...
## WBGene00010963 646.235482854105  0.0560344400948453 0.532154321210568
## WBGene00010964 6610.73359986534  -0.301756164128778 0.339749774394959
## WBGene00010965 698.469898642392 -0.0714973118809531 0.592928262802426
## WBGene00010966 54.8204097454485   0.488445620632377 0.207379835476911
## WBGene00010967 2519.35135960664   0.508179713782424 0.122171381252483
##                              stat               pvalue                 padj
##                         <numeric>            <numeric>            <numeric>
## WBGene00015153   0.98258332530239    0.325812554464328                   NA
## WBGene00002061 -0.502090901373623    0.615603580096765    0.755603864950982
## WBGene00001177   4.94328932904997 7.68153514674162e-07 1.49941766583249e-05
## WBGene00021412   0.93212878872558     0.35126997168092    0.521797355147636
## WBGene00018976  0.148518702524594    0.881933427153093    0.933540011973405
## ...                           ...                  ...                  ...
## WBGene00010963  0.105297350526771    0.916139865263089    0.954440625434673
## WBGene00010964 -0.888171786621956    0.374448352556327    0.544673615403436
## WBGene00010965 -0.120583410112763    0.904021009760152    0.946326013043526
## WBGene00010966   2.35531877778328   0.0185068216532182   0.0590552767545967
## WBGene00010967    4.1595642823437 3.18855275260321e-05 0.000347870583450949
#next we want to shrink the LFC to help visualisation and ranking of genes
resultsNames(dds)
## [1] "Intercept"                     "condition_neurexin_vs_control"
resLFC <- lfcShrink(dds, coef="condition_neurexin_vs_control", type="apeglm")
## using 'apeglm' for LFC shrinkage. If used in published research, please cite:
##     Zhu, A., Ibrahim, J.G., Love, M.I. (2018) Heavy-tailed prior distributions for
##     sequence count data: removing the noise and preserving large differences.
##     Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895
resLFC
## log2 fold change (MAP): condition neurexin vs control 
## Wald test p-value: condition neurexin vs control 
## DataFrame with 14156 rows and 5 columns
##                        baseMean      log2FoldChange             lfcSE
##                       <numeric>           <numeric>         <numeric>
## WBGene00015153 1.40026331133511  0.0865693105221461 0.334873921280774
## WBGene00002061 1023.39358270591 -0.0501595379594478 0.106187975503463
## WBGene00001177 236.878096958743     0.6101360636599 0.131419234864905
## WBGene00021412 24.8706101313319    0.16047086366021 0.231989548013496
## WBGene00018976 6.03204079332923  0.0219340702385293 0.285509651993327
## ...                         ...                 ...               ...
## WBGene00010963 646.235482854105  0.0165071893007835 0.281593365688285
## WBGene00010964 6610.73359986534   -0.15473391499059  0.25612671632153
## WBGene00010965 698.469898642392 -0.0161784771837534 0.289572855203541
## WBGene00010966 54.8204097454485   0.398418895186059 0.203231809685816
## WBGene00010967 2519.35135960664   0.476542635901622 0.122242317594528
##                              pvalue                 padj
##                           <numeric>            <numeric>
## WBGene00015153    0.325812554464328                   NA
## WBGene00002061    0.615603580096765    0.755603864950982
## WBGene00001177 7.68153514674162e-07 1.49941766583249e-05
## WBGene00021412     0.35126997168092    0.521797355147636
## WBGene00018976    0.881933427153093    0.933540011973405
## ...                             ...                  ...
## WBGene00010963    0.916139865263089    0.954440625434673
## WBGene00010964    0.374448352556327    0.544673615403436
## WBGene00010965    0.904021009760152    0.946326013043526
## WBGene00010966   0.0185068216532182   0.0590552767545967
## WBGene00010967 3.18855275260321e-05 0.000347870583450949
#MA plot
plotMA(resLFC, ylim=c(-2,2))

#contrast with the same plot but for the unshrunken log fold changes
plotMA(res, ylim=c(-2,2))

#plot top few genes
resLFCOrdered <- resLFC[order(resLFC$pvalue),]
resLFCOrdered[1:10,]
## log2 fold change (MAP): condition neurexin vs control 
## Wald test p-value: condition neurexin vs control 
## DataFrame with 10 rows and 5 columns
##                        baseMean    log2FoldChange             lfcSE
##                       <numeric>         <numeric>         <numeric>
## WBGene00008862  222.57019613529  5.57679743055826 0.289049887335795
## WBGene00006541 816.161213344128 -1.93485791500237 0.102347519609402
## WBGene00016627 533.260899008354 -2.19269946224634 0.118338343000403
## WBGene00000657 369.382836201042  2.89687520175153 0.167550806546769
## WBGene00000712 519.893685556946  2.72103817505864 0.171840363835382
## WBGene00020218 263.048181538241 -2.09074236696147 0.132190159983885
## WBGene00016453 332.089024007193  -1.9220886364233 0.124163319245512
## WBGene00000703 450.872355408933  2.65300765292485 0.174067247594477
## WBGene00018393 711.229164789001 -1.68439569805028 0.117303671006414
## WBGene00022644 337.023507140299  1.58598181801307 0.115549105174495
##                              pvalue                 padj
##                           <numeric>            <numeric>
## WBGene00008862 3.35503734507776e-83 4.47293578845766e-79
## WBGene00006541 6.37282902848922e-81 4.24812783039092e-77
## WBGene00016627 6.63788957158305e-78 2.94987812561151e-74
## WBGene00000657 2.60644403051886e-68 8.68727795371936e-65
## WBGene00000712 7.67188409824923e-58 2.04563117595718e-54
## WBGene00020218 1.40135503067076e-57 3.11381087815043e-54
## WBGene00016453 2.85659809132675e-55 5.44059510765261e-52
## WBGene00000703 7.94087824248563e-54 1.32334735911023e-50
## WBGene00018393 6.21843170000284e-48 9.21157015827087e-45
## WBGene00022644 5.18169923397144e-44 6.90824141873072e-41
rm(resLFC)
for (i in 1:10) {
 plotCounts(dds, gene=rownames(resLFCOrdered)[i], intgroup="condition")
}

Further exploration of the data using the pcaExplorer package in R (just trying out - DO NOT INCLUDE)

#devtools::install_github("rstudio/d3heatmap") #doesn't install otherwise
#BiocManager::install("pcaExplorer")

library(pcaExplorer)

#need a mapping from Ensembl ids to gene names; will use a t2g subset; row names must map gene ids
t2g.sub <- t2g[,c("ens_gene", "ext_gene")]
t2g.sub<- dplyr::distinct(t2g.sub)
rownames(t2g.sub) <- t2g.sub[,"ens_gene"]
colnames(t2g.sub) <- c("ens_gene", "gene_name")

#pcaExplorer(dds = dds, dst = vsd, annotation= t2g.sub)

#Conclusion: Good for quick exploration but the output downloaded plot for loadings did not agree with the one shown online so I'm not sure how buggy it is....(on another trial it worked...)
#Will explore loadings outside this...

Alt text

Further exploration of the data using the PCAtools package from Bioconductor

#BiocManager::install('PCAtools')

library(PCAtools)
## Loading required package: reshape2
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
## Loading required package: lattice
## 
## Attaching package: 'lattice'
## The following object is masked from 'package:enrichplot':
## 
##     dotplot
## The following object is masked from 'package:clusterProfiler':
## 
##     dotplot
## 
## Attaching package: 'PCAtools'
## The following objects are masked from 'package:stats':
## 
##     biplot, screeplot
p<- pca(assay(vsd), metadata=coldata)

screeplot(p)
## Warning: Removed 2 row(s) containing missing values (geom_path).
## Warning: Removed 2 rows containing missing values (geom_point).

biplot(p, colby="condition")

pairsplot(p, colby= "condition")
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

plotloadings(p)
## -- variables retained:
## WBGene00008862, WBGene00020218, WBGene00006541, WBGene00016627, WBGene00021468, WBGene00000716, WBGene00000656, WBGene00195017, WBGene00195016, WBGene00010314, WBGene00271656, WBGene00010965, WBGene00045168, WBGene00077652, WBGene00016534, WBGene00013699, WBGene00016457, WBGene00010215, WBGene00001789, WBGene00044922, WBGene00017661, WBGene00009523, WBGene00011345, WBGene00011974, WBGene00022240, WBGene00303364, WBGene00009129

loadings(p)[which.max(abs(loadings(p)$PC1)),]
##                       PC1          PC2        PC3         PC4         PC5
## WBGene00008862 0.09067169 -0.008180751 0.03666657 0.003266629 -0.03149979
##                         PC6          PC7         PC8
## WBGene00008862 -0.000739828 0.0002908326 0.002464358
loadings(p)[which.max(abs(loadings(p)$PC2)),]
##                          PC1         PC2         PC3          PC4           PC5
## WBGene00010965 -0.0002111257 -0.07537175 -0.00877533 -0.006426331 -0.0006274245
##                        PC6          PC7          PC8
## WBGene00010965 0.004823226 -0.005803499 -0.009104016

Observations from PCA loadings plot:

  • Largest contribution to PC1 is from gene WBGene00008862 (F15D4.5). There is very little annotation for this gene, except that it plays some regulatory role and interacts with deps-1, a gene localized to P granules. HHblits hits almost exclusively uncharacterised proteins but the C-terminal seems to map to CCHC-type domains (which are zinc fingers). It also hits some Pro-Pol polyproteins across a much larger part of the protein sequence (again at the C-terminus)
  • Large negative loadings for PC1: WBGene00016627 (C44B7.5), an orthologue of human ferric chelate reductase 1. Human orthologues of this gene are implicated in early infantile epileptic encephalopathy 37. WBGene00021468 (epg-2), involved in macroautophagy and negative regulation of autophagosome assmebly.
  • Largest (negative) loading for PC2: WBGene00010965 (ctc-2), cytochrome c oxidase activity, affected by daf-2

Differential expression analysis based on isoform quantification from kallisto

Dealing with ribosomal RNAs

Aine's work showed a huge number of reads mapping to rRNA. I've decided it's probably best to remove them BEFORE proceeding with anything else so we don't have to worry about them later. I saw that many people use bwa or bowtie2 for this but rRNAs of eukaryotes can also have introns so would need a spliced aligner. Other solutions; BBsplit from BBmap can split reads from different references but does it based on k-mers. SortMeRNA also removes reads that map to a set of sequences (used mostly with prokaryotes and metagenomes..would need to check if it works well with eukaryotes).

I decided in the end that the easiest solution might be to rebuild the index that kallisto uses and re-run kallisto so that the rRNAs are not part of the reference transcriptome, allowing the reads to remain unmapped (hopefully!).

We need to remove all sequences with rRNA transcript_biotype reads (there are 23) - the genome/transcriptome files are from Ensembl and can be found in this directory: /d/in16/u/ubcg71a/research/genomes/c.elegans

 zcat Caenorhabditis_elegans.WBcel235.cdna_and_ncrna.fa.gz | grep biotype:rRNA > list_of_biotype_rRNA_transcripts.txt

Found a one-liner online to do this easily (basically, stitches the lines together separated by "", greps and then separates again with newlines:

zcat Caenorhabditis_elegans.WBcel235.cdna_and_ncrna.fa.gz | awk '{ if ((NR>1)&&($0~/^>/)) { printf("\n%s", $0); } else if (NR==1) { printf("%s", $0); } else { printf("\t%s", $0); } }' | grep -vFf pattern.txt - | tr "\t" "\n" > Caenorhabditis_elegans.WBcel235.cdna_and_ncrna_norRNA.fa

Checking:

grep ">" Caenorhabditis_elegans.WBcel235.cdna_and_ncrna_norRNA.fa | wc
61428
zcat Caenorhabditis_elegans.WBcel235.cdna_and_ncrna.fa.gz | grep ">" | wc
61451

wc list_of_biotype_rRNA_transcripts.txt
23

Then zip the final .fa file:

gzip -9 Caenorhabditis_elegans.WBcel235.cdna_and_ncrna_norRNA.fa

Indexing the transcriptome with kallisto

(all transcriptome files are under: /d/in16/u/ubcg71a/research/genomes/c.elegans See README for how they were obtained):

/d/in7/s/kallisto/kallisto_linux-v0.46.1/kallisto index -i Caenorhabditis_elegans.WBcel235.cdna_and_ncrna_norRNA.index Caenorhabditis_elegans.WBcel235.cdna_and_ncrna_norRNA.fa.gz

Pseudo-mapping with kallisto

This blog: http://genomespot.blogspot.com/2015/08/how-accurate-is-kallisto.html suggests that the difference between trimming before kallisto or not trimming does not make a huge difference overall.

So I did not run Trimmomatic on top of fastp as I would have done if we were mapping with STAR or similar.

Running first without pseudoalignment (which takes a lot more time and space) but with bootstrapping (this is run in the directory $PROJECT/kallisto_with_ncRNA_norRNA/):

nohup ./run_kallisto.sh ../fastp_merged/ N1_S5 N2_S6 N3_S7 N4_S8 SG1_S1 SG2_S2 SG3_S3 SG4_S4 >& run_kallisto.out &

I got initially rather low mapping rates (about 20%) and was worried I had the orientation of the reads wrong. I reversed the orientation and that gave less than 1% alignment so that was clearly not the problem. In this forum: https://github.com/pachterlab/kallisto/issues/198 there is a suggestion that the culprit is ncRNAs that are not included in the transcriptome file, which would make sense since we know that 80% of the reads are to rRNA in this case.

I re-did the run with a transcriptome that had ncRNA too. Now, alignment jumped to 98.2% so the ncRNA missing was definitely the problem here (rRNA really..) These results were under kallisto_with_ncRNA, a directory that I eventually deleted to save space. The results without ncRNA are now renamed to kallisto_cDNA NOTE: the abundance estimates for the bootstrap are not written by default to the csv file but they can be written out using:

/d/in7/s/kallisto/kallisto_linux-v0.46.1/kallisto h5dump -o output abundance.h5

(where output is a directory where the results will be saved)

I will probably keep both versions and run sleuth with both to see whether there are any differences that affect the coding genes too.

Finally, I re-did the run with a transcriptome that had ncRNA but no rRNA. I then deleted the previous run that had ncRNA, including rRNAs to save space. So the kallisto_with_ncRNA directory has now been replaced with: kallisto_with_ncRNA_norRNA Alignment rates: p_unique: 9-10%, p_pseudoaligned: ~20%

I also did one re-run with ncRNA and no bootstrap but asking for pseudo-alignment to the genome so that we could get pseudo-bam files for visualisation: In directory: /d/in16/u/ubcg71a/research/filipe/kallisto_with_ncRNA_norRNA/pseudo_bam/

nohup ./run_kallisto_pseudobam.sh ../../fastp_merged/ N1_S5 N2_S6 N3_S7 N4_S8 SG1_S1 SG2_S2 SG3_S3 SG4_S4 >& run_kallisto.out &

Gene-level differential expression using kallisto counts and sleuth

This approach is following the code from: https://github.com/pachterlab/aggregationDE/blob/master/R/gene_pipeline and the procedure of aggregating p-values after transcript differential expression has been carried out to call differentially expressed genes (see paper by Yi et al: doi: 10.1186/s13059-018-1419-z).

Setup directories and variables

'%!in%' <- function(x,y)!('%in%'(x,y))

setwd("~/Dropbox/work/research/Filipe_neurexin/NEW_PROCESS/")

#sample_id <- dir(file.path(".", "kallisto_cdna"))
sample_id <- dir(file.path(".", "kallisto_with_ncRNA_norRNA"))
sample_id
## [1] "N1_S5"  "N2_S6"  "N3_S7"  "N4_S8"  "SG1_S1" "SG2_S2" "SG3_S3" "SG4_S4"
#kal_dirs <- file.path(".", "kallisto_cdna", sample_id)
kal_dirs <- file.path(".", "kallisto_with_ncRNA_norRNA", sample_id)

Read in metadata

Sleuth requires metadata information corresponding to the samples in the dataset. In this case, the metadata is very simple and was already created outside the R script. We simply need to add to this table the directories where kallisto results are saved.

s2c <- read.table("./metadata.txt",  header = TRUE, stringsAsFactors=FALSE)
s2c
##   sample condition
## 1  N1_S5   control
## 2  N2_S6   control
## 3  N3_S7   control
## 4  N4_S8   control
## 5 SG1_S1  neurexin
## 6 SG2_S2  neurexin
## 7 SG3_S3  neurexin
## 8 SG4_S4  neurexin
s2c <- dplyr::mutate(s2c, path = kal_dirs)
s2c
##   sample condition                                path
## 1  N1_S5   control  ./kallisto_with_ncRNA_norRNA/N1_S5
## 2  N2_S6   control  ./kallisto_with_ncRNA_norRNA/N2_S6
## 3  N3_S7   control  ./kallisto_with_ncRNA_norRNA/N3_S7
## 4  N4_S8   control  ./kallisto_with_ncRNA_norRNA/N4_S8
## 5 SG1_S1  neurexin ./kallisto_with_ncRNA_norRNA/SG1_S1
## 6 SG2_S2  neurexin ./kallisto_with_ncRNA_norRNA/SG2_S2
## 7 SG3_S3  neurexin ./kallisto_with_ncRNA_norRNA/SG3_S3
## 8 SG4_S4  neurexin ./kallisto_with_ncRNA_norRNA/SG4_S4

Using biomaRt to map transcript ids to gene names and descriptions

We will get gene names from biomaRt to add to the transcript IDs we have from sleuth

ens <- useMart("ensembl")
attr = c("ensembl_gene_id", 
         "ensembl_transcript_id",
         "description",
         "external_gene_name",
         "entrezgene_id")
celeg <- useMart("ensembl",
                   dataset = "celegans_gene_ensembl")

t2g <- getBM(attributes = attr,
              mart = celeg)
## Warning: `select_()` is deprecated as of dplyr 0.7.0.
## Please use `select()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: `filter_()` is deprecated as of dplyr 0.7.0.
## Please use `filter()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Cache found
t2g <- dplyr::rename(t2g, 
                     target_id = ensembl_transcript_id,
                     ens_gene = ensembl_gene_id, 
                     ext_gene = external_gene_name)

Differential expression with sleuth

We first carry out differential expression with sleuth, summarised at the gene level (using p-value aggregation).

#### GENE-LEVEL-ANALYSIS ####
#Following code from: https://github.com/pachterlab/aggregationDE/blob/master/R/gene_pipeline.R
design <- ~ condition
so <- sleuth_prep(s2c,
                  full_model = design,
                  read_bootstrap_tpm = TRUE,
                  extra_bootstrap_summary=TRUE,
                  target_mapping = t2g,
                  transform_fun_counts = function(x) log2(x + 0.5), #change the natural log fold change to log2 fold change
                  aggregation_column = 'ens_gene'
                  #filter_target_id=txfilter
                  )
## Warning in check_num_cores(num_cores): It appears that you are running Sleuth from within Rstudio.
## Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
## If you wish to take advantage of multiple cores, please consider running sleuth from the command line.
## reading in kallisto results
## dropping unused factor levels
## ........
## normalizing est_counts
## 15845 targets passed the filter
## normalizing tpm
## merging in metadata
## summarizing bootstraps
## ........
so <- sleuth_fit(so, ~condition, 'full')
## fitting measurement error models
## shrinkage estimation
## 4 NA values were found during variance shrinkage estimation due to mean observation values outside of the range used for the LOESS fit.
## The LOESS fit will be repeated using exact computation of the fitted surface to extrapolate the missing values.
## These are the target ids with NA values: T23E7.4a.2, C04F6.1.1, C42D8.2a.1, K07H8.6a.1
## computing variance of betas
#do a Wald test first
so <- sleuth_wt(so, which_beta='conditionneurexin', which_model='full')
sleuth_table_wt <- sleuth_results(so, 
                                  test= 'conditionneurexin', 
                                  test_type =  'wt',
                                  which_model = 'full',
                                  show_all = TRUE)
dim(sleuth_table_wt)
## [1] 46904     8
#how many of these genes have no NAs in q-value? (these are the ones that get filtered when show_all is FALSE)
dim(filter(sleuth_table_wt, !is.na(qval)))
## [1] 11337     8
sleuth_table_wt[1:5,]
##        target_id                                                 description
## 1 WBGene00016627                                                            
## 2 WBGene00008862                                                            
## 3 WBGene00020218                                                            
## 4 WBGene00016453 Very Early Transcript  [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 5 WBGene00000657              COLlagen  [Source:UniProtKB/TrEMBL;Acc:Q20135]
##   ext_gene entrezgene_id num_aggregated_transcripts sum_mean_obs_counts
## 1  C44B7.5        174124                          1            8.681924
## 2  F15D4.5        174969                          1            5.984510
## 3  T04G9.7        180424                          1            7.699620
## 4    vet-2        172993                          1            8.088581
## 5   col-81        174686                          1            7.883827
##           pval         qval
## 1 7.057950e-85 8.001598e-81
## 2 1.613360e-81 9.145333e-78
## 3 1.576815e-60 5.958785e-57
## 4 1.408001e-49 3.990628e-46
## 5 3.739522e-46 8.478992e-43
sleuth_significant_wt <- dplyr::filter(sleuth_table_wt, qval <= 0.05)
#significant genes: 2328
dim(sleuth_significant_wt)
## [1] 2328    8
head(sleuth_significant_wt, 20)
##         target_id
## 1  WBGene00016627
## 2  WBGene00008862
## 3  WBGene00020218
## 4  WBGene00016453
## 5  WBGene00000657
## 6  WBGene00022644
## 7  WBGene00012557
## 8  WBGene00003977
## 9  WBGene00010158
## 10 WBGene00010808
## 11 WBGene00007196
## 12 WBGene00018605
## 13 WBGene00017648
## 14 WBGene00000301
## 15 WBGene00018138
## 16 WBGene00009897
## 17 WBGene00018393
## 18 WBGene00021005
## 19 WBGene00011432
## 20 WBGene00000703
##                                                                             description
## 1                                                                                      
## 2                                                                                      
## 3                                                                                      
## 4                           Very Early Transcript  [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 5                                        COLlagen  [Source:UniProtKB/TrEMBL;Acc:Q20135]
## 6  Downstream Of DAF-16 (Regulated by DAF-16)  [Source:UniProtKB/TrEMBL;Acc:A0A0K3ASK2]
## 7                                                                                      
## 8        Patterned Expression Site; Pes-2 protein  [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 9        Patterned Expression Site; Pes-2 protein  [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 10                                                                                     
## 11                                                                                     
## 12                                                                                     
## 13                      D-aspartate oxidase 3  [Source:UniProtKB/Swiss-Prot;Acc:O01739]
## 14                                 Caveolin-1  [Source:UniProtKB/Swiss-Prot;Acc:Q94051]
## 15                  Folate-like transporter 2  [Source:UniProtKB/Swiss-Prot;Acc:O45166]
## 16                          Peroxidase skpo-1  [Source:UniProtKB/Swiss-Prot;Acc:Q20616]
## 17               Methionine Sulfoxide Reductase A  [Source:UniProtKB/TrEMBL;Acc:O02089]
## 18             Uterine Lumin Expressed/locailized  [Source:UniProtKB/TrEMBL;Acc:X5M5P2]
## 19             SKN-1 Dependent Zygotic transcript  [Source:UniProtKB/TrEMBL;Acc:O02297]
## 20                                       COLlagen  [Source:UniProtKB/TrEMBL;Acc:Q21556]
##     ext_gene entrezgene_id num_aggregated_transcripts sum_mean_obs_counts
## 1    C44B7.5        174124                          1            8.681924
## 2    F15D4.5        174969                          1            5.984510
## 3    T04G9.7        180424                          1            7.699620
## 4      vet-2        172993                          1            8.088581
## 5     col-81        174686                          1            7.883827
## 6     dod-19        178564                          1            8.195626
## 7  Y37D8A.19        176711                          1            9.265287
## 8    pes-2.1        173025                          1            6.538641
## 9    pes-2.2        173026                          1            6.538641
## 10    sepa-1        173196                          1            6.528977
## 11   B0513.4        178351                          1            7.571620
## 12   F48E3.4        181006                          1            8.878642
## 13     ddo-3        184746                          1            8.419052
## 14     cav-1        177815                          1            7.539687
## 15    folt-2        178745                          1           10.118191
## 16    skpo-1        174340                          1            9.551359
## 17    msra-1        185709                          2           16.353021
## 18     ule-1        171778                          1            8.688486
## 19    sdz-30        173198                          1            5.798016
## 20   col-129        178155                          1            8.242931
##            pval         qval
## 1  7.057950e-85 8.001598e-81
## 2  1.613360e-81 9.145333e-78
## 3  1.576815e-60 5.958785e-57
## 4  1.408001e-49 3.990628e-46
## 5  3.739522e-46 8.478992e-43
## 6  6.131492e-41 1.158545e-37
## 7  3.821684e-39 6.189491e-36
## 8  4.000704e-38 5.039554e-35
## 9  4.000704e-38 5.039554e-35
## 10 8.901306e-38 1.009141e-34
## 11 1.780045e-37 1.834579e-34
## 12 1.672903e-36 1.580475e-33
## 13 6.915372e-36 6.030737e-33
## 14 1.116744e-33 9.043235e-31
## 15 3.081094e-32 2.328690e-29
## 16 5.041138e-31 3.571961e-28
## 17 4.419597e-30 2.947351e-27
## 18 2.273631e-29 1.432008e-26
## 19 2.532942e-29 1.511367e-26
## 20 5.864490e-29 3.324286e-26
#and then a LRT test
so <- sleuth_fit(so, ~1, 'null')
## fitting measurement error models
## shrinkage estimation
## 4 NA values were found during variance shrinkage estimation due to mean observation values outside of the range used for the LOESS fit.
## The LOESS fit will be repeated using exact computation of the fitted surface to extrapolate the missing values.
## These are the target ids with NA values: T23E7.4a.2, C04F6.1.1, C42D8.2a.1, K07H8.6a.1
## computing variance of betas
so <- sleuth_lrt(so, 'null', 'full')
sleuth_table_lrt <- sleuth_results(so, test='null:full', 'lrt')
sleuth_table_lrt[1:5,]
##        target_id
## 1 WBGene00006436
## 2 WBGene00008862
## 3 WBGene00016627
## 4 WBGene00018393
## 5 WBGene00020218
##                                                              description
## 1                Titin homolog  [Source:UniProtKB/Swiss-Prot;Acc:G4SLH0]
## 2                                                                       
## 3                                                                       
## 4 Methionine Sulfoxide Reductase A  [Source:UniProtKB/TrEMBL;Acc:O02089]
## 5                                                                       
##   ext_gene entrezgene_id num_aggregated_transcripts sum_mean_obs_counts
## 1    ttn-1        266969                         15           74.087167
## 2  F15D4.5        174969                          1            5.984510
## 3  C44B7.5        174124                          1            8.681924
## 4   msra-1        185709                          2           16.353021
## 5  T04G9.7        180424                          1            7.699620
##           pval         qval
## 1 5.755169e-09 1.631159e-05
## 2 3.530000e-09 1.631159e-05
## 3 5.541369e-09 1.631159e-05
## 4 2.585526e-09 1.631159e-05
## 5 8.194098e-09 1.857930e-05
sleuth_significant_lrt <- dplyr::filter(sleuth_table_lrt, qval <= 0.05)
dim(sleuth_significant_lrt)
## [1] 2251    8
head(sleuth_significant_lrt, 20)
##         target_id
## 1  WBGene00006436
## 2  WBGene00008862
## 3  WBGene00016627
## 4  WBGene00018393
## 5  WBGene00020218
## 6  WBGene00016453
## 7  WBGene00013567
## 8  WBGene00022644
## 9  WBGene00009007
## 10 WBGene00009897
## 11 WBGene00018602
## 12 WBGene00000657
## 13 WBGene00007196
## 14 WBGene00017648
## 15 WBGene00010808
## 16 WBGene00000301
## 17 WBGene00003977
## 18 WBGene00010158
## 19 WBGene00010204
## 20 WBGene00010822
##                                                                             description
## 1                               Titin homolog  [Source:UniProtKB/Swiss-Prot;Acc:G4SLH0]
## 2                                                                                      
## 3                                                                                      
## 4                Methionine Sulfoxide Reductase A  [Source:UniProtKB/TrEMBL;Acc:O02089]
## 5                                                                                      
## 6                           Very Early Transcript  [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 7                                                                                      
## 8  Downstream Of DAF-16 (Regulated by DAF-16)  [Source:UniProtKB/TrEMBL;Acc:A0A0K3ASK2]
## 9       OTUBain deubiquitylating protease homolog  [Source:UniProtKB/TrEMBL;Acc:Q19681]
## 10                          Peroxidase skpo-1  [Source:UniProtKB/Swiss-Prot;Acc:Q20616]
## 11                                                                                     
## 12                                       COLlagen  [Source:UniProtKB/TrEMBL;Acc:Q20135]
## 13                                                                                     
## 14                      D-aspartate oxidase 3  [Source:UniProtKB/Swiss-Prot;Acc:O01739]
## 15                                                                                     
## 16                                 Caveolin-1  [Source:UniProtKB/Swiss-Prot;Acc:Q94051]
## 17       Patterned Expression Site; Pes-2 protein  [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 18       Patterned Expression Site; Pes-2 protein  [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 19                                                                                     
## 20                                                                                     
##     ext_gene entrezgene_id num_aggregated_transcripts sum_mean_obs_counts
## 1      ttn-1        266969                         15           74.087167
## 2    F15D4.5        174969                          1            5.984510
## 3    C44B7.5        174124                          1            8.681924
## 4     msra-1        185709                          2           16.353021
## 5    T04G9.7        180424                          1            7.699620
## 6      vet-2        172993                          1            8.088581
## 7  Y75B12B.1        190716                          3           23.328509
## 8     dod-19        178564                          1            8.195626
## 9     otub-3        177679                          3           23.840400
## 10    skpo-1        174340                          1            9.551359
## 11   F48D6.4        180697                          3           15.152133
## 12    col-81        174686                          1            7.883827
## 13   B0513.4        178351                          1            7.571620
## 14     ddo-3        184746                          1            8.419052
## 15    sepa-1        173196                          1            6.528977
## 16     cav-1        177815                          1            7.539687
## 17   pes-2.1        173025                          1            6.538641
## 18   pes-2.2        173026                          1            6.538641
## 19   F57F5.1        179645                          1           10.940901
## 20  M01G12.9        187385                          1            5.192433
##            pval         qval
## 1  5.755169e-09 1.631159e-05
## 2  3.530000e-09 1.631159e-05
## 3  5.541369e-09 1.631159e-05
## 4  2.585526e-09 1.631159e-05
## 5  8.194098e-09 1.857930e-05
## 6  1.688221e-08 3.189893e-05
## 7  2.016994e-08 3.266666e-05
## 8  2.598034e-08 3.681739e-05
## 9  3.209199e-08 4.042521e-05
## 10 4.835407e-08 5.481901e-05
## 11 7.432027e-08 7.659717e-05
## 12 1.101088e-07 9.775006e-05
## 13 1.143796e-07 9.775006e-05
## 14 1.207110e-07 9.775006e-05
## 15 1.401189e-07 1.059018e-04
## 16 2.276656e-07 1.206595e-04
## 17 2.447886e-07 1.206595e-04
## 18 2.447886e-07 1.206595e-04
## 19 2.350286e-07 1.206595e-04
## 20 2.052485e-07 1.206595e-04
#Note: The Wald test usually returns more hits than the likelihood ratio test

#check counts for top significant genes in WT
selected_gene <- sleuth_significant_wt$target_id[1]
plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene==selected_gene))$target_id[1], units = "est_counts", color_by = "condition")

selected_gene <- sleuth_significant_wt$target_id[2]
plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene==selected_gene))$target_id[1], units = "est_counts", color_by = "condition")

selected_gene <- sleuth_significant_wt$target_id[3]
plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene==selected_gene))$target_id[1], units = "est_counts", color_by = "condition")

#save the gene differential expression results so they can be used with web servers for further analysis

#check the table has no duplicate entries
length(unique(sleuth_table_wt$target_id)) == length(sleuth_table_wt$target_id)
## [1] TRUE
write.table(sleuth_significant_wt$target_id, file="results/sleuth_significant_q005_NEW.txt", quote=FALSE, row.names = FALSE, col.names = "#target_id")

#avoid using this - pointless
write.table(dplyr::filter(sleuth_table_wt, qval > 0.05)$target_id, file="results/bg_genes_good.txt", quote=FALSE, row.names = FALSE, col.names = "#target_id")
#use this instead
write.table(dplyr::filter(sleuth_table_wt, !is.na(qval))$target_id, file="results/bg_genes_all.txt", quote=FALSE, row.names = FALSE, col.names = "#target_id")

Next, we extract differential expression results from the same sleuth object but this time we do not aggregate p-values so our target IDs are transcript IDs.

##### TRANSCRIPT-LEVEL ANALYSIS #####
#To do the transcript-level analysis, simply set the pval_aggregate to FALSE
#NOTE that target_id will now be the transcript id rather than the gene id
#We will use the Wald test only from here on
sleuth_table_txn <- sleuth_results(so, 
                                   test = 'conditionneurexin',
                                   test_type = 'wt',
                                   which_model = 'full',
                                   show_all = TRUE,
                                   pval_aggregate = FALSE)

dim(sleuth_table_txn)
## [1] 61428    15
#how many of these genes have no NAs in q-value? (these are the ones that get filtered when show_all is FALSE) - 15845
dim(filter(sleuth_table_txn, !is.na(qval)))
## [1] 15845    15
sleuth_table_txn[1:5,]
##         ens_gene  target_id
## 1 WBGene00016627  C44B7.5.1
## 2 WBGene00008862  F15D4.5.1
## 3 WBGene00020218  T04G9.7.1
## 4 WBGene00016453 C35E7.1a.1
## 5 WBGene00000657  F38A3.1.1
##                                                   description ext_gene
## 1                                                              C44B7.5
## 2                                                              F15D4.5
## 3                                                              T04G9.7
## 4 Very Early Transcript  [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]    vet-2
## 5              COLlagen  [Source:UniProtKB/TrEMBL;Acc:Q20135]   col-81
##   entrezgene_id         pval         qval         b      se_b mean_obs  var_obs
## 1        174124 7.057950e-85 1.118332e-80 -2.188855 0.1121192 8.681924 1.387181
## 2        174969 1.613360e-81 1.278185e-77  5.590020 0.2923135 5.984510 9.022435
## 3        180424 1.576815e-60 8.328212e-57 -2.091243 0.1274238 7.699620 1.263721
## 4        172993 1.408001e-49 5.577445e-46 -1.915964 0.1294338 8.088581 1.062373
## 5        174686 3.739522e-46 1.185054e-42  2.929165 0.2053731 7.883827 2.523736
##      tech_var     sigma_sq smooth_sigma_sq final_sigma_sq
## 1 0.006406936  0.014941130      0.01873447     0.01873447
## 2 0.123748994 -0.013684067      0.04714542     0.04714542
## 3 0.011712903  0.004862748      0.02076073     0.02076073
## 4 0.013814870  0.001980859      0.01969134     0.01969134
## 5 0.014387106  0.069969075      0.02012675     0.06996908
sleuth_significant_txn <- dplyr::filter(sleuth_table_txn, qval <= 0.05)
dim(sleuth_significant_txn) # 2263 transcripts are significant
## [1] 2263   15
head(sleuth_significant_txn, 20)
##          ens_gene   target_id
## 1  WBGene00016627   C44B7.5.1
## 2  WBGene00008862   F15D4.5.1
## 3  WBGene00020218   T04G9.7.1
## 4  WBGene00016453  C35E7.1a.1
## 5  WBGene00000657   F38A3.1.1
## 6  WBGene00022644   ZK6.10a.1
## 7  WBGene00012557 Y37D8A.19.1
## 8  WBGene00003977   F56G4.2.1
## 9  WBGene00010158   F56G4.3.1
## 10 WBGene00010808   M01E5.6.1
## 11 WBGene00007196  B0513.4a.1
## 12 WBGene00018605   F48E3.4.1
## 13 WBGene00017648  F20H11.5.1
## 14 WBGene00000301  T13F2.8a.1
## 15 WBGene00018138   F37B4.7.1
## 16 WBGene00009897  F49E12.1.1
## 17 WBGene00021005 W03F11.1a.1
## 18 WBGene00011432   T04D3.2.1
## 19 WBGene00000703    M18.1a.1
## 20 WBGene00010204   F57F5.1.1
##                                                                             description
## 1                                                                                      
## 2                                                                                      
## 3                                                                                      
## 4                           Very Early Transcript  [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 5                                        COLlagen  [Source:UniProtKB/TrEMBL;Acc:Q20135]
## 6  Downstream Of DAF-16 (Regulated by DAF-16)  [Source:UniProtKB/TrEMBL;Acc:A0A0K3ASK2]
## 7                                                                                      
## 8        Patterned Expression Site; Pes-2 protein  [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 9        Patterned Expression Site; Pes-2 protein  [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 10                                                                                     
## 11                                                                                     
## 12                                                                                     
## 13                      D-aspartate oxidase 3  [Source:UniProtKB/Swiss-Prot;Acc:O01739]
## 14                                 Caveolin-1  [Source:UniProtKB/Swiss-Prot;Acc:Q94051]
## 15                  Folate-like transporter 2  [Source:UniProtKB/Swiss-Prot;Acc:O45166]
## 16                          Peroxidase skpo-1  [Source:UniProtKB/Swiss-Prot;Acc:Q20616]
## 17             Uterine Lumin Expressed/locailized  [Source:UniProtKB/TrEMBL;Acc:X5M5P2]
## 18             SKN-1 Dependent Zygotic transcript  [Source:UniProtKB/TrEMBL;Acc:O02297]
## 19                                       COLlagen  [Source:UniProtKB/TrEMBL;Acc:Q21556]
## 20                                                                                     
##     ext_gene entrezgene_id         pval         qval         b      se_b
## 1    C44B7.5        174124 7.057950e-85 1.118332e-80 -2.188855 0.1121192
## 2    F15D4.5        174969 1.613360e-81 1.278185e-77  5.590020 0.2923135
## 3    T04G9.7        180424 1.576815e-60 8.328212e-57 -2.091243 0.1274238
## 4      vet-2        172993 1.408001e-49 5.577445e-46 -1.915964 0.1294338
## 5     col-81        174686 3.739522e-46 1.185054e-42  2.929165 0.2053731
## 6     dod-19        178564 6.131492e-41 1.619225e-37  1.624992 0.1212775
## 7  Y37D8A.19        176711 3.821684e-39 8.650655e-36 -1.418486 0.1083750
## 8    pes-2.1        173025 4.000704e-38 7.043462e-35 -2.491918 0.1930361
## 9    pes-2.2        173026 4.000704e-38 7.043462e-35 -2.491918 0.1930361
## 10    sepa-1        173196 8.901306e-38 1.410412e-34 -2.208029 0.1718666
## 11   B0513.4        178351 1.780045e-37 2.564074e-34 -1.673971 0.1308443
## 12   F48E3.4        181006 1.672903e-36 2.208929e-33 -1.422679 0.1127469
## 13     ddo-3        184746 6.915372e-36 8.428775e-33 -1.399326 0.1118917
## 14     cav-1        177815 1.116744e-33 1.263915e-30 -1.591271 0.1315598
## 15    folt-2        178745 3.081094e-32 3.254662e-29 -1.266567 0.1071556
## 16    skpo-1        174340 5.041138e-31 4.992302e-28 -1.228911 0.1060985
## 17     ule-1        171778 2.273631e-29 2.119158e-26 -1.231102 0.1094155
## 18    sdz-30        173198 2.532942e-29 2.229693e-26 -2.685465 0.2388758
## 19   col-129        178155 5.864490e-29 4.890676e-26  2.720450 0.2435987
## 20   F57F5.1        179645 2.536609e-28 2.009629e-25 -1.276974 0.1157005
##     mean_obs   var_obs    tech_var      sigma_sq smooth_sigma_sq final_sigma_sq
## 1   8.681924 1.3871805 0.006406936  0.0149411299      0.01873447     0.01873447
## 2   5.984510 9.0224349 0.123748994 -0.0136840674      0.04714542     0.04714542
## 3   7.699620 1.2637210 0.011712903  0.0048627478      0.02076073     0.02076073
## 4   8.088581 1.0623734 0.013814870  0.0019808591      0.01969134     0.01969134
## 5   7.883827 2.5237360 0.014387106  0.0699690755      0.02012675     0.06996908
## 6   8.195626 0.7631147 0.009918284  0.0001820417      0.01949818     0.01949818
## 7   9.265287 0.5950208 0.003512276  0.0199779905      0.01899745     0.01997799
## 8   6.538641 1.8380668 0.015789672  0.0587362241      0.03029372     0.05873622
## 9   6.538641 1.8380668 0.015789672  0.0587362241      0.03029372     0.05873622
## 10  6.528977 1.4333994 0.028600125  0.0185683623      0.03047611     0.03047611
## 11  7.571620 0.8217040 0.012797086  0.0117981168      0.02144336     0.02144336
## 12  8.878642 0.6000823 0.004552084  0.0208716297      0.01872224     0.02087163
## 13  8.419052 0.5739371 0.006008771  0.0108797729      0.01903073     0.01903073
## 14  7.539687 0.7471720 0.012961105  0.0146920667      0.02165486     0.02165486
## 15 10.118191 0.4730252 0.001910259  0.0152218968      0.02105439     0.02105439
## 16  9.551359 0.4343759 0.003086547  0.0002776849      0.01942725     0.01942725
## 17  8.688486 0.4531146 0.005211902  0.0182179264      0.01873159     0.01873159
## 18  5.798016 2.1486703 0.057929880  0.0449449658      0.05619342     0.05619342
## 19  8.242931 2.2162546 0.011000825  0.1076798091      0.01940853     0.10767981
## 20 10.940901 0.4787103 0.001103854  0.0138371405      0.02566937     0.02566937
write.table(sleuth_significant_txn$target_id, file="results/sleuth_significant_TXN_q005.txt", quote=FALSE, row.names = FALSE, col.names = "#target_id")  #2263 transcripts written

#the transcript table contains beta values too so one can talk about up and down regulated transcripts and genes
sleuth_significant_txn.up <- dplyr::filter(sleuth_table_txn, (qval <= 0.05) & (b>0))  # 905 are upregulated
head(sleuth_significant_txn.up, 20)
##          ens_gene    target_id
## 1  WBGene00008862    F15D4.5.1
## 2  WBGene00000657    F38A3.1.1
## 3  WBGene00022644    ZK6.10a.1
## 4  WBGene00000703     M18.1a.1
## 5  WBGene00017186    F07B7.2.1
## 6  WBGene00021107    W09B7.2.1
## 7  WBGene00000712    F41F3.4.1
## 8  WBGene00001399    F10D2.9.1
## 9  WBGene00017071    D2096.3.1
## 10 WBGene00000998  K07E12.1a.1
## 11 WBGene00019617    K10C2.1.1
## 12 WBGene00010605   K06G5.1a.3
## 13 WBGene00001758 Y45G12C.2a.1
## 14 WBGene00012445   Y16B4A.2.1
## 15 WBGene00010822   M01G12.9.1
## 16 WBGene00020128     R193.2.1
## 17 WBGene00012018   T25C12.3.1
## 18 WBGene00022497 Y119D3B.21.1
## 19 WBGene00003515    K12F2.1.1
## 20 WBGene00021050      W05H9.3
##                                                                             description
## 1                                                                                      
## 2                                        COLlagen  [Source:UniProtKB/TrEMBL;Acc:Q20135]
## 3  Downstream Of DAF-16 (Regulated by DAF-16)  [Source:UniProtKB/TrEMBL;Acc:A0A0K3ASK2]
## 4                                        COLlagen  [Source:UniProtKB/TrEMBL;Acc:Q21556]
## 5                                                                                      
## 6                                                                                      
## 7                                        COLlagen  [Source:UniProtKB/TrEMBL;Acc:Q20282]
## 8        Delta(9)-fatty-acid desaturase fat-7  [Source:UniProtKB/Swiss-Prot;Acc:G5EGH6]
## 9                   Acid Alpha Glucosidase Relate  [Source:UniProtKB/TrEMBL;Acc:Q19004]
## 10                                 Mesocentin  [Source:UniProtKB/Swiss-Prot;Acc:Q09165]
## 11                               Carboxypeptidase  [Source:UniProtKB/TrEMBL;Acc:Q94269]
## 12                                                                                     
## 13             Glutathione S-transferase P 10  [Source:UniProtKB/Swiss-Prot;Acc:Q9N4X8]
## 14                               Carboxypeptidase  [Source:UniProtKB/TrEMBL;Acc:K8ESM2]
## 15                                                                                     
## 16                                                                                     
## 17                                                                                     
## 18                                                                                     
## 19                                   Myosin-3  [Source:UniProtKB/Swiss-Prot;Acc:P12844]
## 20                                                                                     
##      ext_gene entrezgene_id         pval         qval        b      se_b
## 1     F15D4.5        174969 1.613360e-81 1.278185e-77 5.590020 0.2923135
## 2      col-81        174686 3.739522e-46 1.185054e-42 2.929165 0.2053731
## 3      dod-19        178564 6.131492e-41 1.619225e-37 1.624992 0.1212775
## 4     col-129        178155 5.864490e-29 4.890676e-26 2.720450 0.2435987
## 5     F07B7.2        179263 3.689346e-28 2.657167e-25 5.295657 0.4812847
## 6     W09B7.2        189307 3.689346e-28 2.657167e-25 5.295657 0.4812847
## 7     col-139        178857 5.933925e-28 4.087959e-25 2.802625 0.2557083
## 8       fat-7        179100 7.520251e-28 4.964932e-25 1.238670 0.1132366
## 9      aagr-1        177632 2.947247e-26 1.796120e-23 1.285199 0.1212335
## 10      dig-1        175951 5.555927e-24 2.839795e-21 1.448524 0.1434262
## 11    K10C2.1        180915 6.912340e-24 3.422688e-21 1.082287 0.1073911
## 12    K06G5.1        181531 2.726013e-23 1.308899e-20 1.176876 0.1183712
## 13     gst-10        178725 1.757955e-22 8.192588e-20 1.430538 0.1466484
## 14   Y16B4A.2        181572 1.881803e-22 8.519191e-20 1.107607 0.1136243
## 15   M01G12.9        187385 5.564544e-22 2.449172e-19 3.102246 0.3219007
## 16     R193.2        182256 1.774988e-21 7.601264e-19 1.266816 0.1331047
## 17   T25C12.3        259727 2.854599e-21 1.190293e-18 1.082411 0.1143240
## 18 Y119D3B.21        246006 2.328616e-20 8.999250e-18 1.129906 0.1222037
## 19      myo-3        179676 9.198664e-20 3.389601e-17 1.509785 0.1659464
## 20    W05H9.3          <NA> 1.125043e-19 3.961402e-17 1.383175 0.1523970
##     mean_obs   var_obs    tech_var      sigma_sq smooth_sigma_sq final_sigma_sq
## 1   5.984510 9.0224349 0.123748994 -0.0136840674      0.04714542     0.04714542
## 2   7.883827 2.5237360 0.014387106  0.0699690755      0.02012675     0.06996908
## 3   8.195626 0.7631147 0.009918284  0.0001820417      0.01949818     0.01949818
## 4   8.242931 2.2162546 0.011000825  0.1076798091      0.01940853     0.10767981
## 5   4.548737 8.4096536 0.182373435  0.2808964331      0.21121433     0.28089643
## 6   4.548737 8.4096536 0.182373435  0.2808964331      0.21121433     0.28089643
## 7   8.429522 2.3562939 0.008941758  0.1218317041      0.01901130     0.12183170
## 8   9.018604 0.4603540 0.005038598  0.0206064395      0.01878882     0.02060644
## 9   8.230056 0.4859733 0.009961643  0.0064280629      0.01943347     0.01943347
## 10  9.056253 0.6347564 0.021076051  0.0200660739      0.01881452     0.02006607
## 11  9.067225 0.3466997 0.004243176  0.0097917632      0.01882251     0.01882251
## 12  9.502552 0.4013749 0.008687104 -0.0020950765      0.01933636     0.01933636
## 13  7.269504 0.6215641 0.016870246  0.0261412419      0.02336898     0.02614124
## 14  8.448535 0.3643348 0.006843142  0.0092831966      0.01897781     0.01897781
## 15  5.192433 2.7723589 0.102884208 -0.0764415518      0.10435592     0.10435592
## 16  7.527236 0.4656110 0.013691320 -0.0054193025      0.02174240     0.02174240
## 17 10.267849 0.3571526 0.001730881  0.0244090848      0.02167540     0.02440908
## 18  9.682303 0.3903688 0.002851854  0.0270156523      0.01970800     0.02701565
## 19  8.599196 0.6984802 0.006945897  0.0481305326      0.01878800     0.04813053
## 20  8.060157 0.5864352 0.008808961  0.0376407567      0.01974351     0.03764076
#how many genes are upregulated? - 889
length(unique(sleuth_significant_txn.up$ens_gene))
## [1] 889
#we will save the gene ids instead of transcript ones
write.table(unique(sleuth_significant_txn.up$ens_gene), file="results/sleuth_significant_q005_up.txt", quote=FALSE, row.names = FALSE, col.names = "#ens_gene")
sleuth_significant_txn.down <- dplyr::filter(sleuth_table_txn, (qval <= 0.05) & (b<0)) # 1358 are downregulated
head(sleuth_significant_txn.down, 20)
##          ens_gene   target_id
## 1  WBGene00016627   C44B7.5.1
## 2  WBGene00020218   T04G9.7.1
## 3  WBGene00016453  C35E7.1a.1
## 4  WBGene00012557 Y37D8A.19.1
## 5  WBGene00003977   F56G4.2.1
## 6  WBGene00010158   F56G4.3.1
## 7  WBGene00010808   M01E5.6.1
## 8  WBGene00007196  B0513.4a.1
## 9  WBGene00018605   F48E3.4.1
## 10 WBGene00017648  F20H11.5.1
## 11 WBGene00000301  T13F2.8a.1
## 12 WBGene00018138   F37B4.7.1
## 13 WBGene00009897  F49E12.1.1
## 14 WBGene00021005 W03F11.1a.1
## 15 WBGene00011432   T04D3.2.1
## 16 WBGene00010204   F57F5.1.1
## 17 WBGene00018393   F43E2.5.2
## 18 WBGene00017490  F15E11.1.1
## 19 WBGene00017500 F15E11.14.1
## 20 WBGene00000785   W07B8.5.1
##                                                                         description
## 1                                                                                  
## 2                                                                                  
## 3                       Very Early Transcript  [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 4                                                                                  
## 5    Patterned Expression Site; Pes-2 protein  [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 6    Patterned Expression Site; Pes-2 protein  [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 7                                                                                  
## 8                                                                                  
## 9                                                                                  
## 10                  D-aspartate oxidase 3  [Source:UniProtKB/Swiss-Prot;Acc:O01739]
## 11                             Caveolin-1  [Source:UniProtKB/Swiss-Prot;Acc:Q94051]
## 12              Folate-like transporter 2  [Source:UniProtKB/Swiss-Prot;Acc:O45166]
## 13                      Peroxidase skpo-1  [Source:UniProtKB/Swiss-Prot;Acc:Q20616]
## 14         Uterine Lumin Expressed/locailized  [Source:UniProtKB/TrEMBL;Acc:X5M5P2]
## 15         SKN-1 Dependent Zygotic transcript  [Source:UniProtKB/TrEMBL;Acc:O02297]
## 16                                                                                 
## 17           Methionine Sulfoxide Reductase A  [Source:UniProtKB/TrEMBL;Acc:O02089]
## 18          Protein Up-regulated in Daf-2(Gf)  [Source:UniProtKB/TrEMBL;Acc:G5EBF3]
## 19          Protein Up-regulated in Daf-2(Gf)  [Source:UniProtKB/TrEMBL;Acc:G5EBF3]
## 20 Cathepsin B-like cysteine proteinase 5  [Source:UniProtKB/Swiss-Prot;Acc:P43509]
##     ext_gene entrezgene_id         pval         qval         b      se_b
## 1    C44B7.5        174124 7.057950e-85 1.118332e-80 -2.188855 0.1121192
## 2    T04G9.7        180424 1.576815e-60 8.328212e-57 -2.091243 0.1274238
## 3      vet-2        172993 1.408001e-49 5.577445e-46 -1.915964 0.1294338
## 4  Y37D8A.19        176711 3.821684e-39 8.650655e-36 -1.418486 0.1083750
## 5    pes-2.1        173025 4.000704e-38 7.043462e-35 -2.491918 0.1930361
## 6    pes-2.2        173026 4.000704e-38 7.043462e-35 -2.491918 0.1930361
## 7     sepa-1        173196 8.901306e-38 1.410412e-34 -2.208029 0.1718666
## 8    B0513.4        178351 1.780045e-37 2.564074e-34 -1.673971 0.1308443
## 9    F48E3.4        181006 1.672903e-36 2.208929e-33 -1.422679 0.1127469
## 10     ddo-3        184746 6.915372e-36 8.428775e-33 -1.399326 0.1118917
## 11     cav-1        177815 1.116744e-33 1.263915e-30 -1.591271 0.1315598
## 12    folt-2        178745 3.081094e-32 3.254662e-29 -1.266567 0.1071556
## 13    skpo-1        174340 5.041138e-31 4.992302e-28 -1.228911 0.1060985
## 14     ule-1        171778 2.273631e-29 2.119158e-26 -1.231102 0.1094155
## 15    sdz-30        173198 2.532942e-29 2.229693e-26 -2.685465 0.2388758
## 16   F57F5.1        179645 2.536609e-28 2.009629e-25 -1.276974 0.1157005
## 17    msra-1        185709 4.221976e-27 2.675888e-24 -1.896329 0.1758917
## 18   pud-2.1        178713 2.139720e-25 1.210852e-22 -1.192239 0.1144839
## 19   pud-2.2        178710 2.139720e-25 1.210852e-22 -1.192239 0.1144839
## 20     cpr-5        178612 2.938093e-25 1.605313e-22 -1.261826 0.1215186
##     mean_obs   var_obs    tech_var     sigma_sq smooth_sigma_sq final_sigma_sq
## 1   8.681924 1.3871805 0.006406936 0.0149411299      0.01873447     0.01873447
## 2   7.699620 1.2637210 0.011712903 0.0048627478      0.02076073     0.02076073
## 3   8.088581 1.0623734 0.013814870 0.0019808591      0.01969134     0.01969134
## 4   9.265287 0.5950208 0.003512276 0.0199779905      0.01899745     0.01997799
## 5   6.538641 1.8380668 0.015789672 0.0587362241      0.03029372     0.05873622
## 6   6.538641 1.8380668 0.015789672 0.0587362241      0.03029372     0.05873622
## 7   6.528977 1.4333994 0.028600125 0.0185683623      0.03047611     0.03047611
## 8   7.571620 0.8217040 0.012797086 0.0117981168      0.02144336     0.02144336
## 9   8.878642 0.6000823 0.004552084 0.0208716297      0.01872224     0.02087163
## 10  8.419052 0.5739371 0.006008771 0.0108797729      0.01903073     0.01903073
## 11  7.539687 0.7471720 0.012961105 0.0146920667      0.02165486     0.02165486
## 12 10.118191 0.4730252 0.001910259 0.0152218968      0.02105439     0.02105439
## 13  9.551359 0.4343759 0.003086547 0.0002776849      0.01942725     0.01942725
## 14  8.688486 0.4531146 0.005211902 0.0182179264      0.01873159     0.01873159
## 15  5.798016 2.1486703 0.057929880 0.0449449658      0.05619342     0.05619342
## 16 10.940901 0.4787103 0.001103854 0.0138371405      0.02566937     0.02566937
## 17  8.569407 1.0743982 0.043060250 0.0117159100      0.01881556     0.01881556
## 18  7.970674 0.4285923 0.004944962 0.0212681820      0.01992090     0.02126818
## 19  7.970674 0.4285923 0.004944962 0.0212681820      0.01992090     0.02126818
## 20  9.223471 0.4802299 0.004121652 0.0254118707      0.01895503     0.02541187
#how many genes are downpregulated? - 1334
length(unique(sleuth_significant_txn.down$ens_gene))
## [1] 1334
write.table(unique(sleuth_significant_txn.down$ens_gene), file="results/sleuth_significant_q005_down.txt", quote=FALSE, row.names = FALSE, col.names = "#ens_gene")

######## SUMMARY OF NUMBERS  ##########
#all transcripts and genes included initially in the analysis: 61428 transcripts; 46904 genes
length(sleuth_table_txn$target_id)
## [1] 61428
length(sleuth_table_wt$target_id)
## [1] 46904
#note that numbers of genes in the transcript table and number of genes in the gene table are not exactly equal
#there are 46881 genes in the transcript table
length(unique(sleuth_table_txn$ens_gene))
## [1] 46881
#and 46904 genes in the gene table
table(sleuth_table_wt$target_id %!in% unique(sleuth_table_txn$ens_gene))
## 
## FALSE  TRUE 
## 46881    23
#which genes are missing?
extras <- sleuth_table_wt$target_id[sleuth_table_wt$target_id %!in% unique(sleuth_table_txn$ens_gene)]
#why? they are entries with NAs (mostly rrns) so would have been filtered anyway             
filter(sleuth_table_wt, sleuth_table_wt$target_id %in%  extras)
##         target_id description ext_gene entrezgene_id num_aggregated_transcripts
## 1  WBGene00004512              rrn-1.1          <NA>                         NA
## 2  WBGene00004513              rrn-1.2          <NA>                         NA
## 3  WBGene00004567              rrn-2.1          <NA>                         NA
## 4  WBGene00004622              rrn-3.1          <NA>                         NA
## 5  WBGene00004677             rrn-3.56          <NA>                         NA
## 6  WBGene00014454               MTCE.7          <NA>                         NA
## 7  WBGene00014472              MTCE.33          <NA>                         NA
## 8  WBGene00014621              rrn-4.1          <NA>                         NA
## 9  WBGene00077465              rrn-4.2          <NA>                         NA
## 10 WBGene00077466              rrn-4.3          <NA>                         NA
## 11 WBGene00077467              rrn-4.4          <NA>                         NA
## 12 WBGene00077468              rrn-4.5          <NA>                         NA
## 13 WBGene00077469              rrn-4.6          <NA>                         NA
## 14 WBGene00077470              rrn-4.7          <NA>                         NA
## 15 WBGene00077471              rrn-4.8          <NA>                         NA
## 16 WBGene00077472              rrn-4.9          <NA>                         NA
## 17 WBGene00077473             rrn-4.10          <NA>                         NA
## 18 WBGene00077474             rrn-4.11          <NA>                         NA
## 19 WBGene00077475             rrn-4.12          <NA>                         NA
## 20 WBGene00077476             rrn-4.13          <NA>                         NA
## 21 WBGene00077477             rrn-4.14          <NA>                         NA
## 22 WBGene00189966             rrn-4.15          <NA>                         NA
## 23 WBGene00235197             rrn-4.16          <NA>                         NA
##    sum_mean_obs_counts pval qval
## 1                   NA   NA   NA
## 2                   NA   NA   NA
## 3                   NA   NA   NA
## 4                   NA   NA   NA
## 5                   NA   NA   NA
## 6                   NA   NA   NA
## 7                   NA   NA   NA
## 8                   NA   NA   NA
## 9                   NA   NA   NA
## 10                  NA   NA   NA
## 11                  NA   NA   NA
## 12                  NA   NA   NA
## 13                  NA   NA   NA
## 14                  NA   NA   NA
## 15                  NA   NA   NA
## 16                  NA   NA   NA
## 17                  NA   NA   NA
## 18                  NA   NA   NA
## 19                  NA   NA   NA
## 20                  NA   NA   NA
## 21                  NA   NA   NA
## 22                  NA   NA   NA
## 23                  NA   NA   NA
#number of transcripts (15845) and genes (11337) with sleuth data in the gene p-value aggregation table i.e. filtered genes and transcripts
sum(sleuth_table_wt$num_aggregated_transcripts, na.rm=TRUE )
## [1] 15845
length(sleuth_table_wt[!is.na(sleuth_table_wt$num_aggregated_transcripts),] $target_id)
## [1] 11337
#significantly DE genes (aggregated) in the gene results table at q<0.05 - 2328 genes
length(sleuth_significant_wt$target_id)
## [1] 2328
#significantly DE transcripts in the transcript results table - 2263
length(sleuth_significant_txn$target_id) 
## [1] 2263
#how many genes correspond to these 2263 transcripts? - 2217
length(unique(sleuth_significant_txn$ens_gene))
## [1] 2217
########### PROBLEM???
#Need to check this with Pachter's blogs and guidelines as the transcript and gene results are supposed
#to be fully compatible (but I think it's fine because the top DE genes and genes corresponding to top DE
#transcripts seem to agree..it's marginal cases that seem to be the problem)
#genes in common:
length(intersect(unique(sleuth_significant_txn$ens_gene), sleuth_significant_wt$target_id)) # answer is 2107
## [1] 2107
#which genes are not in the transcript table but are in the gene table as differentially expressed?
#genes unique to the transcript table:
length(setdiff(unique(sleuth_significant_txn$ens_gene), sleuth_significant_wt$target_id)) #answer is 110
## [1] 110
#genes unique to the gene table:
length(setdiff(sleuth_significant_wt$target_id, unique(sleuth_significant_txn$ens_gene))) #answer is 221
## [1] 221
#compare the very top results - they are identical up to top 50 and start having more differences as they build up...
length(intersect(sleuth_significant_wt$ext_gene[1:50], unique(sleuth_significant_txn$ext_gene)[1:50]))
## [1] 50
length(intersect(sleuth_significant_wt$ext_gene[1:100], unique(sleuth_significant_txn$ext_gene)[1:100]))
## [1] 96
length(intersect(sleuth_significant_wt$ext_gene[1:500], unique(sleuth_significant_txn$ext_gene)[1:500]))
## [1] 484
###################

#significantly DE transcripts with large fold changes (abs(b)>1) - 480 transcripts
dim(filter(sleuth_significant_txn, abs(b) > 1))
## [1] 480  15
#how many genes correspond to these? - 469 genes
length(filter(sleuth_significant_txn, abs(b) > 1)$ens_gene)
## [1] 480
length(unique(filter(sleuth_significant_txn, abs(b) > 1)$ens_gene))
## [1] 469
#volcano plot for genes
#Note: there is a problem with volcano and MA plots not setting hte p-value aggregate to FALSE when reading
#results from the WT test and thus not realising there is a b column
#Error reported is: 
#Error in FUN(X[[i]], ...) : object 'b' not found
#To fix this, manually set the pval_aggregate in so before supplying to plots as suggested here:
#https://github.com/pachterlab/sleuth/issues/233
# so$pval_aggregate <- FALSE
# plot_volcano(obj=so, 
        # test = "conditionneurexin", 
        # test_type = "wt", 
        # which_model="full", 
        # sig_level = 0.05, 
        # sig_color = "red")
#can also do MA plot
# plot_ma(obj=so, 
        # test = "conditionneurexin", 
        # test_type = "wt", 
        # which_model="full", 
        # sig_level = 0.05, 
        # sig_color = "red")
# ALTERNATIVELY
#use the EnhancedVolcano package and function to make nicer plots and to also highlight 
#genes easily (the highlight function in plot_volcano from sleuth is not working for me)

#not used as using cutoffs within the volcano plot options
#select.lab <- head(sleuth_significant_txn, 20)$target_id

#use the same function that plot_volcano() uses to produce the data frame needed to make the plot
#we could use also the results table from before and remove the NAs but this is cleaner
toptable <- sleuth_results(so, 
                           test="conditionneurexin", 
                           test_type="wt", 
                           which_model="full",
                           pval_aggregate=FALSE,
                           rename_cols=FALSE,
                           show_all=FALSE)
dim(toptable) #should be the "filtered" transcripts, i.e. 15845
## [1] 15845    15
#add a column that the volcano plot is using later
toptable <- dplyr::mutate(toptable, significant = qval < 0.05)
dim(dplyr::filter(toptable, abs(b)>1 & (significant == TRUE))) #this should be 480 transcripts, as before
## [1] 480  16
EnhancedVolcano(
     toptable=toptable,
     lab=toptable$target_id,
     x='b',
     y='qval',
     xlab="beta_value",
     selectLab = NULL,
     title="Sleuth results (without p-value aggregation)",
     subtitle="",
     legend=c("NS","beta","P","P & beta"),
     legendLabels=c("NS","beta","P","P & beta"),
     transcriptLabSize=5.0,
     pLabellingCutoff=10e-6,
     FCcutoff=1.0
     )
## Warning in EnhancedVolcano(toptable = toptable, lab = toptable$target_id, :
## transcriptLabSize argument deprecated in v1.4 - please use labSize

#create a table with information about top 10 most significant genes
select.res <- dplyr::filter(toptable, abs(b)>1)[1:10,c(1,2,5,6,7)]
dplyr::filter(t2g, target_id %in% select.res$target_id)
##          ens_gene   target_id
## 1  WBGene00000657   F38A3.1.1
## 2  WBGene00003977   F56G4.2.1
## 3  WBGene00008862   F15D4.5.1
## 4  WBGene00010158   F56G4.3.1
## 5  WBGene00010808   M01E5.6.1
## 6  WBGene00012557 Y37D8A.19.1
## 7  WBGene00016453  C35E7.1a.1
## 8  WBGene00016627   C44B7.5.1
## 9  WBGene00020218   T04G9.7.1
## 10 WBGene00022644   ZK6.10a.1
##                                                                             description
## 1                                        COLlagen  [Source:UniProtKB/TrEMBL;Acc:Q20135]
## 2        Patterned Expression Site; Pes-2 protein  [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 3                                                                                      
## 4        Patterned Expression Site; Pes-2 protein  [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 5                                                                                      
## 6                                                                                      
## 7                           Very Early Transcript  [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 8                                                                                      
## 9                                                                                      
## 10 Downstream Of DAF-16 (Regulated by DAF-16)  [Source:UniProtKB/TrEMBL;Acc:A0A0K3ASK2]
##     ext_gene entrezgene_id
## 1     col-81        174686
## 2    pes-2.1        173025
## 3    F15D4.5        174969
## 4    pes-2.2        173026
## 5     sepa-1        173196
## 6  Y37D8A.19        176711
## 7      vet-2        172993
## 8    C44B7.5        174124
## 9    T04G9.7        180424
## 10    dod-19        178564
merge(x=select.res, y=dplyr::filter(t2g, target_id %in% select.res$target_id), by="target_id", sort=FALSE, no.dups=TRUE)[,c("target_id","ext_gene","entrezgene_id.x","qval","description")]
##      target_id  ext_gene entrezgene_id.x         qval
## 1    C44B7.5.1   C44B7.5          174124 1.118332e-80
## 2    F15D4.5.1   F15D4.5          174969 1.278185e-77
## 3    T04G9.7.1   T04G9.7          180424 8.328212e-57
## 4   C35E7.1a.1     vet-2          172993 5.577445e-46
## 5    F38A3.1.1    col-81          174686 1.185054e-42
## 6    ZK6.10a.1    dod-19          178564 1.619225e-37
## 7  Y37D8A.19.1 Y37D8A.19          176711 8.650655e-36
## 8    F56G4.2.1   pes-2.1          173025 7.043462e-35
## 9    F56G4.3.1   pes-2.2          173026 7.043462e-35
## 10   M01E5.6.1    sepa-1          173196 1.410412e-34
##                                                                             description
## 1                                                                                      
## 2                                                                                      
## 3                                                                                      
## 4                           Very Early Transcript  [Source:UniProtKB/TrEMBL;Acc:H1ZUW7]
## 5                                        COLlagen  [Source:UniProtKB/TrEMBL;Acc:Q20135]
## 6  Downstream Of DAF-16 (Regulated by DAF-16)  [Source:UniProtKB/TrEMBL;Acc:A0A0K3ASK2]
## 7                                                                                      
## 8        Patterned Expression Site; Pes-2 protein  [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 9        Patterned Expression Site; Pes-2 protein  [Source:UniProtKB/TrEMBL;Acc:G5ECQ5]
## 10
#Examine some bootstrapped counts for individual transcripts of the same gene (note these are log2 transformed by sleuth)
plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene=="WBGene00019606"))$target_id[1], units = "est_counts", color_by = "condition")

plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene=="WBGene00019606"))$target_id[2], units = "est_counts", color_by = "condition")

plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene=="WBGene00019606"))$target_id[3], units = "est_counts", color_by = "condition")

plot_bootstrap(so, target_id=(dplyr::filter(t2g, ens_gene=="WBGene00019606"))$target_id[4], units = "est_counts", color_by = "condition")

#use the shiny app when not knitting the document
#sleuth_live(so)

A note on sleuth counts

The "raw" counts of sleuth correspond more or less to assay(dds) in DESeq2. The "est_counts" from the normalised data frame (so[["obs_norm""]]) are on a similar scale to raw counts and correlate well. The tpm values take into account the length of the genes/transcripts and show reduced correlation to the est_counts. Neither est_counts or tpm values are stored as log normalised - a log must be taken, if one wants to have logged values. However, when the diff expression results are obtained the mean_obs is a value obtained as the log (or log2 if the transformation function has been changed to log2) of the mean of observations across all samples.

As the est_counts are not log-transformed, the PCA plots based on them are likely dominated simply by the genes with very high read counts.

#pick a gene with large counts

t2g[t2g$target_id == "ZK185.9",]
##             ens_gene target_id description ext_gene entrezgene_id
## 60688 WBGene00220238   ZK185.9              ZK185.9            NA
#DEseq2 counts
assay(dds)["WBGene00220238",]
##  N1_S5  N2_S6  N3_S7  N4_S8 SG1_S1 SG2_S2 SG3_S3 SG4_S4 
##   9234   8849  10032   7426   6046   5497   5767   7229
#log2-transformed
log2(assay(dds)["WBGene00220238",])
##    N1_S5    N2_S6    N3_S7    N4_S8   SG1_S1   SG2_S2   SG3_S3   SG4_S4 
## 13.17274 13.11130 13.29232 12.85837 12.56177 12.42443 12.49361 12.81958
#rlog-transformed
assay(rld)["WBGene00220238",]
##    N1_S5    N2_S6    N3_S7    N4_S8   SG1_S1   SG2_S2   SG3_S3   SG4_S4 
## 13.20606 13.00318 13.35027 12.82825 12.54772 12.58265 12.59644 12.63061
#raw counts from sleuth
filter(so[["obs_raw"]], so[["obs_raw"]]$target_id=="ZK185.9")$est_counts
## [1]  9262  8902 10082  7458  6062  5510  5794  7255
#normalised counts from_sleuth
filter(so[["obs_norm"]], so[["obs_norm"]]$target_id=="ZK185.9")$est_counts
## [1] 10904.018  8848.183 12688.245  7280.942  5331.194  5580.740  5628.467
## [8]  5831.883
#log2 and normalised counts from sleuth
mean(log2(filter(so[["obs_norm"]], so[["obs_norm"]]$target_id=="ZK185.9")$est_counts))
## [1] 12.84745
#mean_obs from sleuth transcript table results
filter(sleuth_table_txn, target_id=="ZK185.9")$mean_obs
## [1] 12.84755
#sleuth_matrix results should agree
sleuth_matrix <- sleuth_to_matrix(so, 'obs_norm', 'est_counts')
sleuth_matrix.raw <- sleuth_to_matrix(so, 'obs_raw', 'est_counts')
sleuth_matrix.tpm <- sleuth_to_matrix(so, 'obs_norm', 'tpm')
sleuth_matrix[59889,]
##     N1_S5     N2_S6     N3_S7     N4_S8    SG1_S1    SG2_S2    SG3_S3    SG4_S4 
## 10904.018  8848.183 12688.245  7280.942  5331.194  5580.740  5628.467  5831.883
log2(mean(sleuth_matrix[59889,]))
## [1] 12.92216
log2(mean(sleuth_matrix.raw[59889,]))
## [1] 12.88047
sleuth_matrix.tpm[59889,]
##    N1_S5    N2_S6    N3_S7    N4_S8   SG1_S1   SG2_S2   SG3_S3   SG4_S4 
## 32939.89 27935.04 39259.63 23175.80 16106.07 17440.62 17340.28 18035.60

Exploratory Data Analysis with kallisto/sleuth

#Here, we check PCA plots based on kallisto counts
#NOTE: all PCA plots below are done with the transcript counts as sleuth_to_matrix won't return gene-summarised counts
#first, using sleuth's own plot_pca (this, it seems, uses transcript counts after all)
#In addition, note that est_counts in sleuth are not log2 normalised
#see: https://hbctraining.github.io/DGE_workshop_salmon/lessons/09_sleuth.html

plot_pca(so,
            color_by = 'condition',
            text_labels = TRUE)

#and then using the PCAtools package
#need first the counts table (which_units would be scaled_reads_per_base for genes but it doesn't work for aggregation of pvalues (probably only for the old gene_mode=true))
#sleuth_matrix <- sleuth_to_matrix(so, 'obs_norm', 'scaled_reads_per_base')
sleuth_matrix <- sleuth_to_matrix(so, 'obs_norm', 'est_counts')
dim(sleuth_matrix) #clearly this is counts of transcripts, not genes
## [1] 61428     8
#alternative (this is how plot_pca in sleuth gets the matrix) - THIS WON'T WORK because sleuth.R does
#not make the spread_abundance_by function available...
#mat <- spread_abundance_by(so$obs_norm_filt, 'est_counts', 
#            so$sample_to_covariates$sample)

#will remove using removeVar a % of variables with low variance: we want to keep about 500 transcripts and we have about 60,000 so we will remove 99% of all variables
p<- pca(sleuth_matrix, metadata=coldata, removeVar=0.99)
## -- removing the lower 99% of variables based on variance
dim(p$loadings) #should give about 600 transcripts
## [1] 614   8
#this is essentially the same PCA as by doing: prcomp(t(sleuth_matrix)), scale=FALSE, center=TRUE)

screeplot(p)
## Warning: Removed 2 row(s) containing missing values (geom_path).
## Warning: Removed 2 rows containing missing values (geom_point).

#the PCA plot is virtually identical to the one drawn by sleuth's plot_pca()
biplot(p, colby="condition", labSize=4.0, widthConnectors=0.2, xlim=c(-25000, 20000), title="Kallisto est_counts")

pairsplot(p, colby= "condition")
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

plotloadings(p)
## -- variables retained:
## C24F3.6.1, C42D8.2a.1, F31E3.5.2, MTCE.26.1, B0035.10.1, K07H8.6a.1, H02I12.7.1, ZC155.8, H02I12.6.1, T27E9.1d.3

#get the top 20 for PC1 and PC2
rownames(p$loadings)[order(abs(p$loadings[,1]))][1:20]
##  [1] "F58E10.3a.1"  "Y67D8C.5h.1"  "C23G10.3.1"   "Y105E8B.1a.1" "F25H2.5.1"   
##  [6] "W09C5.6a.1"   "C07G2.3a.1"   "T23D8.3.1"    "T27E9.7.2"    "F28C6.7a.1"  
## [11] "F01G4.6a.3"   "H06I04.4a.3"  "T25G3.2.1"    "Y17G7B.5a.1"  "C36E6.5.1"   
## [16] "C47D12.1b.1"  "T15B7.3.1"    "C53D5.6.1"    "C14B9.7.1"    "T23G11.3.2"
#or getting all the info about these transcripts
t2g[t2g$target_id %in% rownames(p$loadings)[order(abs(p$loadings[,1]))][1:20],]
##             ens_gene    target_id
## 628   WBGene00000380   C07G2.3a.1
## 839   WBGene00000496    T25G3.2.1
## 1148  WBGene00000716    T15B7.3.1
## 2842  WBGene00001595   T23G11.3.2
## 3592  WBGene00002077    C53D5.6.1
## 4101  WBGene00002978 Y105E8B.1a.1
## 4459  WBGene00003154  Y17G7B.5a.1
## 4921  WBGene00003370    C36E6.5.1
## 6953  WBGene00004433    C14B9.7.1
## 6967  WBGene00004440   F28C6.7a.1
## 6980  WBGene00004445   W09C5.6a.1
## 7044  WBGene00004472   C23G10.3.1
## 9943  WBGene00006725  H06I04.4a.3
## 10848 WBGene00007028  C47D12.1b.1
## 13058 WBGene00008505   F01G4.6a.3
## 13949 WBGene00009119    F25H2.5.1
## 15632 WBGene00010260  F58E10.3a.1
## 18140 WBGene00011944    T23D8.3.1
## 18374 WBGene00012097    T27E9.7.2
## 32556 WBGene00022069  Y67D8C.5h.1
##                                                                                                                                            description
## 628                                                                      T-complex protein 1 subunit epsilon  [Source:UniProtKB/Swiss-Prot;Acc:P47209]
## 839                                                                                    Chitin synthase chs-1  [Source:UniProtKB/Swiss-Prot;Acc:G5ECD6]
## 1148                                                                                                    COLlagen  [Source:UniProtKB/TrEMBL;Acc:O17038]
## 2842                                                         Female germline-specific tumor suppressor gld-1  [Source:UniProtKB/Swiss-Prot;Acc:Q17339]
## 3592                                                                                        IMportin Beta family  [Source:UniProtKB/TrEMBL;Acc:Q9N5V3]
## 4101                                                                            Tropomyosin isoforms a/b/d/f  [Source:UniProtKB/Swiss-Prot;Acc:Q22866]
## 4459                                                                                                DNA helicase  [Source:UniProtKB/TrEMBL;Acc:Q7K7J6]
## 4921                                                                         Myosin regulatory light chain 2  [Source:UniProtKB/Swiss-Prot;Acc:P19626]
## 6953                                                                               60S ribosomal protein L21  [Source:UniProtKB/Swiss-Prot;Acc:P34334]
## 6967                                                                               60S ribosomal protein L26  [Source:UniProtKB/Swiss-Prot;Acc:Q19869]
## 6980                                                                               60S ribosomal protein L31  [Source:UniProtKB/Swiss-Prot;Acc:Q9U332]
## 7044                                                                                40S ribosomal protein S3  [Source:UniProtKB/Swiss-Prot;Acc:P48152]
## 9943  Ubiquitin-like protein 1-40S ribosomal protein S27a Ubiquitin-like protein 1 40S ribosomal protein S27a [Source:UniProtKB/Swiss-Prot;Acc:P37165]
## 10848                                                                     Transcription-associated protein 1  [Source:UniProtKB/Swiss-Prot;Acc:G5EEV2]
## 13058                                                               Phosphate carrier protein, mitochondrial  [Source:UniProtKB/Swiss-Prot;Acc:P40614]
## 13949                                                                              Nucleoside diphosphate kinase  [Source:UniProtKB/TrEMBL;Acc:Q93576]
## 15632                                                                                  DEAD boX helicase homolog  [Source:UniProtKB/TrEMBL;Acc:Q9XUW5]
## 18140                                                                                                                                                 
## 18374                                                                                   ABC transporter, class F  [Source:UniProtKB/TrEMBL;Acc:G5EFG4]
## 32556                                                                         Enhancer of EfL-1 mutant phenotype  [Source:UniProtKB/TrEMBL;Acc:U4PBK1]
##       ext_gene entrezgene_id
## 628      cct-5        175588
## 839      chs-1        172508
## 1148   col-143        179073
## 2842     gld-1        172532
## 3592     imb-3       3565440
## 4101    lev-11        173319
## 4459     mcm-2        174841
## 4921     mlc-2        181775
## 6953    rpl-21        176146
## 6967    rpl-26        174379
## 6980    rpl-31        173235
## 7044     rps-3        175879
## 9943     ubl-1        175413
## 10848    trr-1        174788
## 13058  F01G4.6        178020
## 13949    ndk-1        172939
## 15632   ddx-17        179897
## 18140  T23D8.3        172857
## 18374   abcf-2        176771
## 32556    eel-1        177084
rownames(p$loadings)[order(abs(p$loadings[,2]))][1:20]
##  [1] "Y111B2A.18.5" "T19H12.2.1"   "H06H21.3.1"   "K01G5.4.1"    "ZC434.5.1"   
##  [6] "ZK813.3.1"    "C17F4.7.1"    "C53A5.1.1"    "C44B12.1.1"   "ZC395.10.2"  
## [11] "T04H1.4a.1"   "Y73B6BL.38.1" "Y37E3.8a.1"   "F14H3.6.1"    "Y67D8C.5h.1" 
## [16] "R06C7.4.1"    "C26F1.9.1"    "Y67D8C.5d.1"  "K12C11.2.1"   "F10B5.1.2"
t2g[t2g$target_id %in% rownames(p$loadings)[order(abs(p$loadings[,2]))][1:20],]
##             ens_gene    target_id
## 2274  WBGene00001337    ZC434.5.1
## 6665  WBGene00004296   T04H1.4a.1
## 6675  WBGene00004302    K01G5.4.1
## 6922  WBGene00004421    F10B5.1.2
## 7009  WBGene00004453    C26F1.9.1
## 7147  WBGene00004700 Y111B2A.18.5
## 7470  WBGene00004888   K12C11.2.1
## 12664 WBGene00008262    C53A5.1.1
## 13567 WBGene00008825    F14H3.6.1
## 16839 WBGene00011063    R06C7.4.1
## 23548 WBGene00015913    C17F4.7.1
## 24671 WBGene00016636   C44B12.1.1
## 28243 WBGene00019162   H06H21.3.1
## 30240 WBGene00020588   T19H12.2.1
## 31369 WBGene00021350   Y37E3.8a.1
## 32552 WBGene00022069  Y67D8C.5d.1
## 32556 WBGene00022069  Y67D8C.5h.1
## 32824 WBGene00022257 Y73B6BL.38.1
## 33312 WBGene00022599   ZC395.10.2
## 33639 WBGene00022822    ZK813.3.1
##                                                                                                     description
## 2274                               Glutamyl(E) Amino-acyl tRNA Synthetase  [Source:UniProtKB/TrEMBL;Acc:Q23315]
## 6665                                        DNA repair protein rad-50  [Source:UniProtKB/Swiss-Prot;Acc:O44199]
## 6675                                GTP-binding nuclear protein ran-1  [Source:UniProtKB/Swiss-Prot;Acc:O17915]
## 6922                                        60S ribosomal protein L10  [Source:UniProtKB/Swiss-Prot;Acc:Q09533]
## 7009                                        60S ribosomal protein L39  [Source:UniProtKB/Swiss-Prot;Acc:P52814]
## 7147                 Probable splicing factor, arginine/serine-rich 3  [Source:UniProtKB/Swiss-Prot;Acc:Q9NEW6]
## 7470                                 Small ubiquitin-related modifier  [Source:UniProtKB/Swiss-Prot;Acc:P55853]
## 12664                                              RNAi-Induced Longevity  [Source:UniProtKB/TrEMBL;Acc:O17694]
## 13567                                                                                                          
## 16839                                      Chondroitin proteoglycan 3  [Source:UniProtKB/Swiss-Prot;Acc:Q21771]
## 23548                                                                                                          
## 24671                                                  PERMeable eggshell  [Source:UniProtKB/TrEMBL;Acc:O44145]
## 28243                                        Eukaryotic Initiation Factor  [Source:UniProtKB/TrEMBL;Acc:Q9TXU7]
## 30240 Acidic leucine-rich nuclear phosphoprotein 32-related protein 2  [Source:UniProtKB/Swiss-Prot;Acc:O01615]
## 31369                                                                                                          
## 32552                                  Enhancer of EfL-1 mutant phenotype  [Source:UniProtKB/TrEMBL;Acc:U4PBK1]
## 32556                                  Enhancer of EfL-1 mutant phenotype  [Source:UniProtKB/TrEMBL;Acc:U4PBK1]
## 32824                                 PUF (Pumilio/FBF) domain-containing  [Source:UniProtKB/TrEMBL;Acc:Q8IU00]
## 33312                                     Co-chaperone protein daf-41  [Source:UniProtKB/Swiss-Prot;Acc:Q23280]
## 33639                                                                                                          
##       ext_gene entrezgene_id
## 2274    ears-1        172904
## 6665    rad-50        179678
## 6675     ran-1        176503
## 6922    rpl-10        174311
## 7009    rpl-39        179153
## 7147     rsp-3        176688
## 7470     smo-1        266820
## 12664    ril-1        179958
## 13567  F14H3.6        180077
## 16839    cpg-3        172465
## 23548  C17F4.7        173668
## 24671   perm-2        176933
## 28243  eif-1.A        177254
## 30240 T19H12.2        178881
## 31369  Y37E3.8        171767
## 32552    eel-1        177084
## 32556    eel-1        177084
## 32824   puf-11        177406
## 33312   daf-41        175727
## 33639  ZK813.3        180634
#NOTE: the est_counts are not logged in the PCA so we expect the largest counts to dominate as they have more variance in RNA-seq data
quantile(sleuth_table_txn$mean_obs, na.rm=TRUE)
##         0%        25%        50%        75%       100% 
##  0.7675163  3.5198016  5.7369968  7.8353703 14.8657162
quantile(sleuth_table_txn$var_obs, na.rm=TRUE)
##           0%          25%          50%          75%         100% 
##  0.001561586  0.071094574  0.203265678  0.902161600 27.826633120
filter(sleuth_table_txn, target_id %in% rownames(p$loadings)[order(abs(p$loadings[,1]))][1:20])  #counts here are log2 counts
##          ens_gene    target_id
## 1  WBGene00003370    C36E6.5.1
## 2  WBGene00012097    T27E9.7.2
## 3  WBGene00002978 Y105E8B.1a.1
## 4  WBGene00000716    T15B7.3.1
## 5  WBGene00000496    T25G3.2.1
## 6  WBGene00008505   F01G4.6a.3
## 7  WBGene00004440   F28C6.7a.1
## 8  WBGene00003154  Y17G7B.5a.1
## 9  WBGene00004433    C14B9.7.1
## 10 WBGene00004445   W09C5.6a.1
## 11 WBGene00000380   C07G2.3a.1
## 12 WBGene00006725  H06I04.4a.3
## 13 WBGene00004472   C23G10.3.1
## 14 WBGene00009119    F25H2.5.1
## 15 WBGene00001595   T23G11.3.2
## 16 WBGene00011944    T23D8.3.1
## 17 WBGene00010260  F58E10.3a.1
## 18 WBGene00002077    C53D5.6.1
## 19 WBGene00022069  Y67D8C.5h.1
## 20 WBGene00007028  C47D12.1b.1
##                                                                                                                                         description
## 1                                                                         Myosin regulatory light chain 2  [Source:UniProtKB/Swiss-Prot;Acc:P19626]
## 2                                                                                    ABC transporter, class F  [Source:UniProtKB/TrEMBL;Acc:G5EFG4]
## 3                                                                            Tropomyosin isoforms a/b/d/f  [Source:UniProtKB/Swiss-Prot;Acc:Q22866]
## 4                                                                                                    COLlagen  [Source:UniProtKB/TrEMBL;Acc:O17038]
## 5                                                                                   Chitin synthase chs-1  [Source:UniProtKB/Swiss-Prot;Acc:G5ECD6]
## 6                                                                Phosphate carrier protein, mitochondrial  [Source:UniProtKB/Swiss-Prot;Acc:P40614]
## 7                                                                               60S ribosomal protein L26  [Source:UniProtKB/Swiss-Prot;Acc:Q19869]
## 8                                                                                                DNA helicase  [Source:UniProtKB/TrEMBL;Acc:Q7K7J6]
## 9                                                                               60S ribosomal protein L21  [Source:UniProtKB/Swiss-Prot;Acc:P34334]
## 10                                                                              60S ribosomal protein L31  [Source:UniProtKB/Swiss-Prot;Acc:Q9U332]
## 11                                                                    T-complex protein 1 subunit epsilon  [Source:UniProtKB/Swiss-Prot;Acc:P47209]
## 12 Ubiquitin-like protein 1-40S ribosomal protein S27a Ubiquitin-like protein 1 40S ribosomal protein S27a [Source:UniProtKB/Swiss-Prot;Acc:P37165]
## 13                                                                               40S ribosomal protein S3  [Source:UniProtKB/Swiss-Prot;Acc:P48152]
## 14                                                                              Nucleoside diphosphate kinase  [Source:UniProtKB/TrEMBL;Acc:Q93576]
## 15                                                        Female germline-specific tumor suppressor gld-1  [Source:UniProtKB/Swiss-Prot;Acc:Q17339]
## 16                                                                                                                                                 
## 17                                                                                  DEAD boX helicase homolog  [Source:UniProtKB/TrEMBL;Acc:Q9XUW5]
## 18                                                                                       IMportin Beta family  [Source:UniProtKB/TrEMBL;Acc:Q9N5V3]
## 19                                                                         Enhancer of EfL-1 mutant phenotype  [Source:UniProtKB/TrEMBL;Acc:U4PBK1]
## 20                                                                     Transcription-associated protein 1  [Source:UniProtKB/Swiss-Prot;Acc:G5EEV2]
##    ext_gene entrezgene_id      pval      qval           b      se_b  mean_obs
## 1     mlc-2        181775 0.1573189 0.4156609  0.20255406 0.1432342 10.917980
## 2    abcf-2        176771 0.2613953 0.5546491  1.88264129 1.6762909  8.040718
## 3    lev-11        173319 0.2672519 0.5625815  0.17964413 0.1619271 10.507566
## 4   col-143        179073 0.3920183 0.6871922  0.17741363 0.2072679 11.359617
## 5     chs-1        172508 0.3929816 0.6880750 -0.12320007 0.1442247 10.537196
## 6   F01G4.6        178020 0.4183658 0.7107329  0.39480216 0.4878560  9.811750
## 7    rpl-26        174379 0.4201749 0.7121933  0.10188433 0.1263889 11.523997
## 8     mcm-2        174841 0.4287614 0.7185601 -0.58715189 0.7419972  9.186787
## 9    rpl-21        176146 0.4320922 0.7207602  0.11262738 0.1433618 11.939822
## 10   rpl-31        173235 0.4966842 0.7659363  0.07988817 0.1175317 11.058116
## 11    cct-5        175588 0.5824013 0.8167949 -0.17780989 0.3233610  9.867698
## 12    ubl-1        175413 0.5989128 0.8264193  0.07082058 0.1346489 11.486430
## 13    rps-3        175879 0.6171552 0.8384484  0.07756783 0.1551710 12.547761
## 14    ndk-1        172939 0.6274093 0.8432864  0.06690722 0.1378460 11.993970
## 15    gld-1        172532 0.6503874 0.8555740  0.11618394 0.2563499 10.276923
## 16  T23D8.3        172857 0.8093159 0.9345974 -0.33352247 1.3821404  7.392134
## 17   ddx-17        179897 0.8927598 0.9677811 -0.02900975 0.2151852 10.601373
## 18    imb-3       3565440 0.9493524 0.9862971 -0.01218098 0.1917661 10.965742
## 19    eel-1        177084 0.9607273 0.9890853  0.01289716 0.2619195 10.180434
## 20    trr-1        174788 0.9780775 0.9928535 -0.03543874 1.2896562  8.762508
##       var_obs     tech_var      sigma_sq smooth_sigma_sq final_sigma_sq
## 1  0.04689265 0.0012062850  0.0398257590      0.02549664     0.03982576
## 2  5.82972701 1.9460234703  3.6738786294      0.01978004     3.67387863
## 3  0.05416982 0.0052782755  0.0471625105      0.02285893     0.04716251
## 4  0.08263871 0.0007424335  0.0851775337      0.02937493     0.08517753
## 5  0.03999512 0.0177451347  0.0238564219      0.02302239     0.02385642
## 6  0.45253984 0.0293356041  0.4466712882      0.02003988     0.44667129
## 7  0.02816261 0.0007992522  0.0285969852      0.03114905     0.03114905
## 8  1.04231599 0.4422604901  0.6588590533      0.01892136     0.65885905
## 9  0.03885730 0.0004983207  0.0406068850      0.03661105     0.04060688
## 10 0.02186696 0.0010285086  0.0223555711      0.02659887     0.02659887
## 11 0.18828292 0.0432134808  0.1659111355      0.02020059     0.16591114
## 12 0.03251359 0.0008267791  0.0354338952      0.03072603     0.03543390
## 13 0.02374789 0.0003575085  0.0253427731      0.04779856     0.04779856
## 14 0.03063936 0.0005643507  0.0336893831      0.03743869     0.03743869
## 15 0.11651154 0.0047859463  0.1266446135      0.02171587     0.12664461
## 16 1.89883705 3.7979747665 -1.6197439567      0.02264961     0.02264961
## 17 0.07961986 0.0305765634  0.0620327569      0.02339009     0.06203276
## 18 0.06308395 0.0010506232  0.0724978597      0.02585985     0.07249786
## 19 0.09989635 0.1159014447  0.0005888509      0.02130218     0.02130218
## 20 0.54414177 3.3077146586 -2.6733012304      0.01871170     0.01871170
filter(so[["obs_norm"]], target_id %in% rownames(p$loadings)[order(abs(p$loadings[,1]))][1:20])
##        target_id sample  est_counts          tpm    eff_len   len
## 1     C07G2.3a.1  N1_S5  565.792627   65.9835239  1762.1665  1928
## 2     C07G2.3a.1  N2_S6 1633.898377  189.8387113  1749.8902  1928
## 3     C07G2.3a.1  N3_S7  938.042353  109.0026126  1756.9261  1928
## 4     C07G2.3a.1  N4_S8 1121.754656  130.6415994  1750.0664  1928
## 5     C07G2.3a.1 SG1_S1  935.462090  110.3560996  1755.0990  1928
## 6     C07G2.3a.1 SG2_S2  896.176158  105.0782241  1751.9367  1928
## 7     C07G2.3a.1 SG3_S3  776.560620   91.5892399  1758.6784  1928
## 8     C07G2.3a.1 SG4_S4  912.542312  107.3256718  1756.0500  1928
## 9      C14B9.7.1  N1_S5 4046.329996 2258.6147685   368.1665   534
## 10     C14B9.7.1  N2_S6 3345.650763 1910.7989267   355.9886   534
## 11     C14B9.7.1  N3_S7 4252.487568 2391.6720608   363.0023   534
## 12     C14B9.7.1  N4_S8 3537.963810 2024.5625675   356.1724   534
## 13     C14B9.7.1 SG1_S1 3987.402183 2285.8846728   361.1665   534
## 14     C14B9.7.1 SG2_S2 3294.763435 1890.3256779   358.0349   534
## 15     C14B9.7.1 SG3_S3 4849.379812 2758.2378516   364.6784   534
## 16     C14B9.7.1 SG4_S4 4368.887950 2490.9473398   362.2379   534
## 17    C23G10.3.1  N1_S5 6729.363473 2156.8915674   641.1665   807
## 18    C23G10.3.1  N2_S6 5071.155731 1639.4669422   628.8902   807
## 19    C23G10.3.1  N3_S7 5887.285245 1890.0634762   635.9261   807
## 20    C23G10.3.1  N4_S8 5743.333635 1860.8257213   629.0664   807
## 21    C23G10.3.1 SG1_S1 6216.794450 2029.9307633   634.0990   807
## 22    C23G10.3.1 SG2_S2 5316.388955 1730.8870226   630.9367   807
## 23    C23G10.3.1 SG3_S3 6852.468989 2228.9529891   637.6784   807
## 24    C23G10.3.1 SG4_S4 6317.403937 2054.4367795   635.0872   807
## 25     C36E6.5.1  N1_S5 2194.932340  924.0134801   488.1665   654
## 26     C36E6.5.1  N2_S6 1553.680577  663.7820792   475.8902   654
## 27     C36E6.5.1  N3_S7 2065.416054  873.1054381   482.9574   654
## 28     C36E6.5.1  N4_S8 1500.820981  642.5390254   476.0664   654
## 29     C36E6.5.1 SG1_S1 1971.328879  848.3917238   481.0990   654
## 30     C36E6.5.1 SG2_S2 2100.956693  902.9932260   477.9367   654
## 31     C36E6.5.1 SG3_S3 2181.220415  933.4722941   484.6784   654
## 32     C36E6.5.1 SG4_S4 2052.166454  879.0736131   482.1418   654
## 33   C47D12.1b.1  N1_S5  183.417332    3.0624724 12308.1665 12474
## 34   C47D12.1b.1  N2_S6  622.174884   10.2878122 12295.8902 12474
## 35   C47D12.1b.1  N3_S7  564.028707    9.3596745 12302.9261 12474
## 36   C47D12.1b.1  N4_S8  577.432770    9.5713535 12296.0664 12474
## 37   C47D12.1b.1 SG1_S1  280.671070    4.7241698 12301.0990 12474
## 38   C47D12.1b.1 SG2_S2  283.939459    4.7427610 12297.9367 12474
## 39   C47D12.1b.1 SG3_S3  827.728301   13.9531753 12304.6784 12474
## 40   C47D12.1b.1 SG4_S4  510.786499    8.5753015 12302.0500 12474
## 41     C53D5.6.1  N1_S5 1963.587047  113.8573858  3544.1665  3710
## 42     C53D5.6.1  N2_S6 1707.527528   98.2949152  3531.8902  3710
## 43     C53D5.6.1  N3_S7 2134.188352  123.1201693  3538.9261  3710
## 44     C53D5.6.1  N4_S8 2271.521469  131.0768870  3532.0664  3710
## 45     C53D5.6.1 SG1_S1 1962.470963  114.8755989  3537.0990  3710
## 46     C53D5.6.1 SG2_S2 1438.019837   83.5879848  3533.9367  3710
## 47     C53D5.6.1 SG3_S3 2275.700787  133.3165894  3540.6784  3710
## 48     C53D5.6.1 SG4_S4 2446.843014  142.8333495  3538.0500  3710
## 49    F01G4.6a.3  N1_S5 1199.679839  250.0001510   986.1665  1152
## 50    F01G4.6a.3  N2_S6  412.613208   86.1397623   973.8902  1152
## 51    F01G4.6a.3  N3_S7 1179.461018  245.4795164   980.9261  1152
## 52    F01G4.6a.3  N4_S8  644.628582  134.8835954   974.0664  1152
## 53    F01G4.6a.3 SG1_S1  916.048019  193.7150706   979.0990  1152
## 54    F01G4.6a.3 SG2_S2  582.534896  122.6134330   975.9367  1152
## 55    F01G4.6a.3 SG3_S3 1416.000581  298.8873604   982.6784  1152
## 56    F01G4.6a.3 SG4_S4 1489.382410  313.8668679   980.0500  1152
## 57     F25H2.5.1  N1_S5 4485.485136 1869.1364071   493.1665   659
## 58     F25H2.5.1  N2_S6 3496.731843 1478.3829649   480.8902   659
## 59     F25H2.5.1  N3_S7 4014.630145 1679.6995360   487.9574   659
## 60     F25H2.5.1  N4_S8 4004.615769 1696.6567018   481.0664   659
## 61     F25H2.5.1 SG1_S1 4295.207822 1829.4950856   486.0990   659
## 62     F25H2.5.1 SG2_S2 3355.533741 1427.2799738   482.9367   659
## 63     F25H2.5.1 SG3_S3 4700.750983 1991.1860541   489.6784   659
## 64     F25H2.5.1 SG4_S4 4480.622159 1899.6359120   487.1418   659
## 65    F28C6.7a.1  N1_S5 3384.256498 2293.0075869   303.3074   469
## 66    F28C6.7a.1  N2_S6 2494.390535 1740.4850788   291.3833   469
## 67    F28C6.7a.1  N3_S7 2810.829141 1924.7783971   298.1410   469
## 68    F28C6.7a.1  N4_S8 2750.204724 1922.0556362   291.6334   469
## 69    F28C6.7a.1 SG1_S1 2951.481603 2060.9629545   296.5116   469
## 70    F28C6.7a.1 SG2_S2 2693.786423 1885.8111975   293.4288   469
## 71    F28C6.7a.1 SG3_S3 3503.309227 2423.7831937   299.8059   469
## 72    F28C6.7a.1 SG4_S4 3107.762471 2158.5699941   297.3508   469
## 73   F58E10.3a.1  N1_S5 1255.679622  123.5773521  2088.1665  2254
## 74   F58E10.3a.1  N2_S6 1554.291074  152.2293904  2075.8902  2254
## 75   F58E10.3a.1  N3_S7 1802.236523  176.6468423  2082.9261  2254
## 76   F58E10.3a.1  N4_S8 1721.903878  169.0463501  2076.0664  2254
## 77   F58E10.3a.1 SG1_S1 1631.857072  162.3533092  2081.0990  2254
## 78   F58E10.3a.1 SG2_S2 1105.564582  109.2922964  2077.9367  2254
## 79   F58E10.3a.1 SG3_S3 1525.895787  151.8243139  2084.6784  2254
## 80   F58E10.3a.1 SG4_S4 2029.972006  201.3661079  2082.0500  2254
## 81   H06I04.4a.3  N1_S5 3099.411765 1579.8649924   403.1665   569
## 82   H06I04.4a.3  N2_S6 2423.899364 1260.6864924   390.9111   569
## 83   H06I04.4a.3  N3_S7 2963.928704 1520.4520885   397.9816   569
## 84   H06I04.4a.3  N4_S8 2756.949832 1436.6184339   391.1340   569
## 85   H06I04.4a.3 SG1_S1 2876.315745 1503.3278696   396.1454   569
## 86   H06I04.4a.3 SG2_S2 2390.863217 1249.7412749   392.9817   569
## 87   H06I04.4a.3 SG3_S3 3353.600469 1740.4285325   399.6784   569
## 88   H06I04.4a.3 SG4_S4 3239.499468 1684.5215224   397.1813   569
## 89     T15B7.3.1  N1_S5 2920.646145  705.9925346   850.1665  1016
## 90     T15B7.3.1  N2_S6 1914.256717  464.4977440   837.8902  1016
## 91     T15B7.3.1  N3_S7 2832.382988  684.3861468   844.9261  1016
## 92     T15B7.3.1  N4_S8 2352.555344  572.1363752   838.0664  1016
## 93     T15B7.3.1 SG1_S1 2701.732916  663.4918901   843.0990  1016
## 94     T15B7.3.1 SG2_S2 2102.604208  514.2197232   839.9367  1016
## 95     T15B7.3.1 SG3_S3 3355.824898  822.1221360   846.6784  1016
## 96     T15B7.3.1 SG4_S4 3196.240573  782.0936740   844.0500  1016
## 97     T23D8.3.1  N1_S5  166.347901   26.4355311  1293.1665  1459
## 98     T23D8.3.1  N2_S6  724.018260  114.9233172  1280.8902  1459
## 99     T23D8.3.1  N3_S7   33.814217    5.3601427  1287.9261  1459
## 100    T23D8.3.1  N4_S8  304.270388   48.4090081  1281.0664  1459
## 101    T23D8.3.1 SG1_S1   77.325877   12.4486491  1286.0990  1459
## 102    T23D8.3.1 SG2_S2  241.691835   38.6985331  1282.9367  1459
## 103    T23D8.3.1 SG3_S3   91.952906   14.7890254  1289.6784  1459
## 104    T23D8.3.1 SG4_S4  287.247659   46.0944422  1287.0500  1459
## 105   T23G11.3.2  N1_S5 1474.144287  209.7721782  1444.1665  1610
## 106   T23G11.3.2  N2_S6  882.719954  125.3382893  1431.8902  1610
## 107   T23G11.3.2  N3_S7 1335.059190  189.4218115  1438.9261  1610
## 108   T23G11.3.2  N4_S8 1159.022043  164.9554528  1432.0664  1610
## 109   T23G11.3.2 SG1_S1 1266.560639  182.4782349  1437.0990  1610
## 110   T23G11.3.2 SG2_S2  881.281668  126.2474186  1433.9367  1610
## 111   T23G11.3.2 SG3_S3 1512.961711  217.8293599  1440.6784  1610
## 112   T23G11.3.2 SG4_S4 1645.643905  236.3466019  1438.0500  1610
## 113    T25G3.2.1  N1_S5 1517.345302   74.0645382  4210.1665  4376
## 114    T25G3.2.1  N2_S6 1561.714293   75.6381750  4197.8902  4376
## 115    T25G3.2.1  N3_S7 1282.566033   62.2714988  4204.9261  4376
## 116    T25G3.2.1  N4_S8 1900.599036   92.2739969  4198.0664  4376
## 117    T25G3.2.1 SG1_S1 1383.627255   68.1586928  4203.0990  4376
## 118    T25G3.2.1 SG2_S2 1412.406876   69.0803966  4199.9367  4376
## 119    T25G3.2.1 SG3_S3 1256.939224   61.9769904  4206.6784  4376
## 120    T25G3.2.1 SG4_S4 1670.966360   82.0894440  4204.0500  4376
## 121    T27E9.7.2  N1_S5  819.103102   79.4325773  2119.1665  2285
## 122    T27E9.7.2  N2_S6    4.461545    0.4305404  2106.8902  2285
## 123    T27E9.7.2  N3_S7  436.793962   42.1846820  2113.9261  2285
## 124    T27E9.7.2  N4_S8  198.353126   19.1866415  2107.0664  2285
## 125    T27E9.7.2 SG1_S1  527.446351   51.7053865  2112.0990  2285
## 126    T27E9.7.2 SG2_S2  279.093819   27.1846923  2108.9367  2285
## 127    T27E9.7.2 SG3_S3  667.022275   65.3952501  2115.6784  2285
## 128    T27E9.7.2 SG4_S4  663.043344   64.8066582  2113.0500  2285
## 129   W09C5.6a.1  N1_S5 2247.154431 1676.0835773   275.5259   441
## 130   W09C5.6a.1  N2_S6 1860.002662 1433.5727511   263.7937   441
## 131   W09C5.6a.1  N3_S7 2179.098930 1645.2782702   270.3993   441
## 132   W09C5.6a.1  N4_S8 2029.351157 1566.6135290   264.0180   441
## 133   W09C5.6a.1 SG1_S1 2162.922539 1665.1826586   268.9372   441
## 134   W09C5.6a.1 SG2_S2 1850.666306 1430.0596307   265.8347   441
## 135   W09C5.6a.1 SG3_S3 2451.558197 1868.9216647   272.0863   441
## 136   W09C5.6a.1 SG4_S4 2350.637353 1800.7303943   269.6028   441
## 137 Y105E8B.1a.1  N1_S5 1698.336848  264.5753492  1319.1665  1485
## 138 Y105E8B.1a.1  N2_S6 1146.633514  178.3840690  1306.8902  1485
## 139 Y105E8B.1a.1  N3_S7 1480.567172  230.0515332  1313.9261  1485
## 140 Y105E8B.1a.1  N4_S8 1212.536184  189.0754720  1307.0664  1485
## 141 Y105E8B.1a.1 SG1_S1 1443.686139  227.8127123  1312.0990  1485
## 142 Y105E8B.1a.1 SG2_S2 1330.944100  208.8713736  1308.9367  1485
## 143 Y105E8B.1a.1 SG3_S3 1727.984465  272.4241553  1315.6784  1485
## 144 Y105E8B.1a.1 SG4_S4 1732.967261  272.5816261  1313.0500  1485
## 145  Y17G7B.5a.1  N1_S5  529.992749   40.9436232  2660.1665  2826
## 146  Y17G7B.5a.1  N2_S6 1087.027414   83.4662799  2647.8902  2826
## 147  Y17G7B.5a.1  N3_S7  308.421887   23.7170520  2654.9261  2826
## 148  Y17G7B.5a.1  N4_S8 1460.082202  112.3793638  2648.0664  2826
## 149  Y17G7B.5a.1 SG1_S1  812.464553   63.4048869  2653.0990  2826
## 150  Y17G7B.5a.1 SG2_S2  742.520980   57.5586998  2649.9367  2826
## 151  Y17G7B.5a.1 SG3_S3  158.817091   12.3997740  2656.6784  2826
## 152  Y17G7B.5a.1 SG4_S4  530.577577   41.2883258  2654.0500  2826
## 153  Y67D8C.5h.1  N1_S5 1385.349549   21.9841250 12950.1665 13116
## 154  Y67D8C.5h.1  N2_S6 1417.570864   22.2767509 12937.8902 13116
## 155  Y67D8C.5h.1  N3_S7  956.696930   15.0883850 12944.9261 13116
## 156  Y67D8C.5h.1  N4_S8  946.356074   14.9081354 12938.0664 13116
## 157  Y67D8C.5h.1 SG1_S1 1407.698592   22.5186922 12943.0990 13116
## 158  Y67D8C.5h.1 SG2_S2 1045.700973   16.6001916 12939.9367 13116
## 159  Y67D8C.5h.1 SG3_S3  862.738928   13.8221801 12946.6784 13116
## 160  Y67D8C.5h.1 SG4_S4 1451.006474   23.1519005 12944.0500 13116
#Repeat wiith kallisto TPMs
#note that there is only partial correlation between estimated counts and TPMs (TPMs take into account the gene length; est_counts seem to be normalised by sleuth when obs_norm is used)
sleuth_matrix.tpm <- sleuth_to_matrix(so, 'obs_norm', 'tpm')
dim(sleuth_matrix.tpm) #clearly this is counts of transcripts, not genes
## [1] 61428     8
cor(sleuth_matrix.tpm[,1], sleuth_matrix[,1])
## [1] 0.5984869
#will remove using removeVar a % of variables with low variance: we want to keep about 500 transcripts and we have about 60,000 so we will remove 99% of all variables
p.tpm<- pca(sleuth_matrix.tpm, metadata=coldata, removeVar=0.99)
## -- removing the lower 99% of variables based on variance
dim(p.tpm$loadings) #should give about 600 transcripts
## [1] 614   8
#this is essentially the same PCA as by doing: prcomp(t(sleuth_matrix)), scale=FALSE, center=TRUE)

screeplot(p.tpm)

#FIGURE: PCA_tpm_counts
#the PCA plot is virtually identical to the one drawn by sleuth's plot_pca()
biplot(p.tpm, colby="condition", labSize=4.0, widthConnectors=0.2, title="Kallisto TPM; top 1% varied transcripts")

pairsplot(p.tpm, colby= "condition")
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

#this command seems to hang for tpm values so  I'm not running it
#plotloadings(p)
#instead we can plot the top 20 for PC1 and PC2
rownames(p.tpm$loadings)[order(abs(p$loadings[,1]))][1:20]
##  [1] "T04C12.6.4"  "F28H1.2.1"   "Y48G8AL.8.1" "C06A1.12"    "H06I04.4a.3"
##  [6] "F53A2.6a.2"  "F26E4.6.1"   "R144.17"     "Y48G10A.9"   "ZK632.9.2"  
## [11] "F08G2.1.1"   "F43E2.5.2"   "Y54G2A.75.1" "W03G11.1a.1" "K10B3.7.2"  
## [16] "F23H11.5.1"  "D1054.10.1"  "K02D7.3a.1"  "F25H2.5.1"   "T23D8.6.1"
t2g[t2g$target_id %in% rownames(p.tpm$loadings)[order(abs(p.tpm$loadings[,1]))][1:20],]
##             ens_gene   target_id
## 795   WBGene00000465  C07G2.1a.1
## 814   WBGene00000479   C07H6.5.1
## 1936  WBGene00001168   F31E3.5.1
## 2156  WBGene00001243  F41H10.7.1
## 2158  WBGene00001244  F41H10.8.1
## 6889  WBGene00004410 Y62E10A.1.2
## 6942  WBGene00004427  K11H12.2.1
## 11137 WBGene00007235  C01G10.8.1
## 18509 WBGene00012179   W01D2.1.1
## 21924 WBGene00014633     ZK897.4
## 24703 WBGene00016655   C44E4.6.3
## 29271 WBGene00019880   R05D3.6.1
## 33294 WBGene00022582   ZC262.5.1
## 33865 WBGene00023020     F30H5.4
## 33916 WBGene00023071    F56C9.12
## 35569 WBGene00045139    F01D5.12
## 35571 WBGene00045141   Y53F4B.47
## 35585 WBGene00045153    F11A10.9
## 60358 WBGene00219908    F20C5.10
## 60615 WBGene00220165   Y47G6A.36
##                                                                                             description
## 795                                Chondroitin proteoglycan 1  [Source:UniProtKB/Swiss-Prot;Acc:Q17802]
## 814                          ATP-dependent RNA helicase cgh-1  [Source:UniProtKB/Swiss-Prot;Acc:Q95YF3]
## 1936                                Elongation factor 1-alpha  [Source:UniProtKB/Swiss-Prot;Acc:P53013]
## 2156      Elongation of very long chain fatty acids protein 5  [Source:UniProtKB/Swiss-Prot;Acc:Q20300]
## 2158      Elongation of very long chain fatty acids protein 6  [Source:UniProtKB/Swiss-Prot;Acc:Q20303]
## 6889                Ribosomal protein, Large subunit, Acidic (P1)  [Source:UniProtKB/TrEMBL;Acc:Q9U1X9]
## 6942                                60S ribosomal protein L15  [Source:UniProtKB/Swiss-Prot;Acc:P91374]
## 11137 Activator of HSp90 (HSP90) Atpase (ATPase) activity homolog  [Source:UniProtKB/TrEMBL;Acc:Q93168]
## 18509                                       Ribosomal protein L37  [Source:UniProtKB/TrEMBL;Acc:O62388]
## 21924                                                                                                  
## 24703                      Acyl-CoA-binding protein homolog 1  [Source:UniProtKB/Swiss-Prot;Acc:O01805]
## 29271    Putative ATP synthase subunit epsilon, mitochondrial  [Source:UniProtKB/Swiss-Prot;Acc:P34539]
## 33294    Putative ATP synthase subunit epsilon, mitochondrial  [Source:UniProtKB/Swiss-Prot;Acc:P34539]
## 33865                                                                                                  
## 33916                                                                                                  
## 35569                                                                                                  
## 35571                                                                                                  
## 35585                                                                                                  
## 60358                                                                                                  
## 60615                                                                                                  
##        ext_gene entrezgene_id
## 795       cpg-1        175586
## 814       cgh-1        176061
## 1936   eef-1A.1        175975
## 2156      elo-5        177320
## 2158      elo-6        177321
## 6889      rla-2        178297
## 6942     rpl-15        176891
## 11137    ahsa-1        180017
## 18509   W01D2.1        175141
## 21924   ZK897.4            NA
## 24703    acbp-1        172071
## 29271   R05D3.6        176178
## 33294   ZC262.5        191132
## 33865   F30H5.4            NA
## 33916  F56C9.12            NA
## 35569  F01D5.12            NA
## 35571 Y53F4B.47            NA
## 35585  F11A10.9            NA
## 60358  F20C5.10            NA
## 60615 Y47G6A.36            NA
rownames(p.tpm$loadings)[order(abs(p.tpm$loadings[,2]))][1:20]
##  [1] "ZK813.3.1"    "M18.7a.1"     "F40F8.10.1"   "T05G5.10b.2"  "B0280.5.1"   
##  [6] "T07C4.5.1"    "C37A2.7.1"    "T10C6.5.1"    "Y82E9BR.3.2"  "K02F2.11"    
## [11] "H13N06.6b.1"  "T08B2.10.2"   "ZK287.5.2"    "C33H5.12a.1"  "C53H9.1.1"   
## [16] "ZK945.2.2"    "R01H2.6.1"    "Y49E10.15.1"  "C27H5.9"      "Y111B2A.18.5"
t2g[t2g$target_id %in% rownames(p.tpm$loadings)[order(abs(p.tpm$loadings[,2]))][1:20],]
##             ens_gene    target_id
## 259   WBGene00000122     M18.7a.1
## 3564  WBGene00002064  T05G5.10b.2
## 5885  WBGene00003928    ZK945.2.2
## 6706  WBGene00004320    ZK287.5.2
## 6970  WBGene00004441    C53H9.1.1
## 7059  WBGene00004478   F40F8.10.1
## 7073  WBGene00004486   T08B2.10.2
## 7147  WBGene00004700 Y111B2A.18.5
## 7153  WBGene00004703  C33H5.12a.1
## 7515  WBGene00004919  Y49E10.15.1
## 9582  WBGene00006541  H13N06.6b.1
## 9925  WBGene00006713    R01H2.6.1
## 17562 WBGene00011561    T07C4.5.1
## 17729 WBGene00011687    T10C6.5.1
## 22411 WBGene00015102    B0280.5.1
## 24450 WBGene00016493    C37A2.7.1
## 32952 WBGene00022336  Y82E9BR.3.2
## 33639 WBGene00022822    ZK813.3.1
## 35378 WBGene00044938      C27H5.9
## 60449 WBGene00220000     K02F2.11
##                                                                                       description
## 259                       Ref/ALY RNA export adaptor family  [Source:UniProtKB/TrEMBL;Acc:Q21559]
## 3564      Eukaryotic translation initiation factor 5A-1  [Source:UniProtKB/Swiss-Prot;Acc:P34563]
## 5885                    Proteasome subunit alpha type-3  [Source:UniProtKB/Swiss-Prot;Acc:Q09583]
## 6706                                 RING-box protein 1  [Source:UniProtKB/Swiss-Prot;Acc:Q23457]
## 6970                          60S ribosomal protein L27  [Source:UniProtKB/Swiss-Prot;Acc:P91914]
## 7059                           40S ribosomal protein S9  [Source:UniProtKB/Swiss-Prot;Acc:Q20228]
## 7073                          40S ribosomal protein S17  [Source:UniProtKB/Swiss-Prot;Acc:O01692]
## 7147   Probable splicing factor, arginine/serine-rich 3  [Source:UniProtKB/Swiss-Prot;Acc:Q9NEW6]
## 7153   Probable splicing factor, arginine/serine-rich 6  [Source:UniProtKB/Swiss-Prot;Acc:Q18409]
## 7515         Probable small nuclear ribonucleoprotein E  [Source:UniProtKB/Swiss-Prot;Acc:Q9XTU6]
## 9582                          Tyramine beta-hydroxylase  [Source:UniProtKB/Swiss-Prot;Acc:Q9XTQ6]
## 9925                           UBiquitin Conjugating enzyme  [Source:UniProtKB/TrEMBL;Acc:Q21633]
## 17562                     Transthyretin-like protein 15  [Source:UniProtKB/Swiss-Prot;Acc:Q22288]
## 17729      Spliceosome-associated protein CWC15 homolog  [Source:UniProtKB/Swiss-Prot;Acc:O45766]
## 22411                        Chondroitin proteoglycan-2  [Source:UniProtKB/Swiss-Prot;Acc:P41996]
## 24450                   60S acidic ribosomal protein P2  [Source:UniProtKB/Swiss-Prot;Acc:O01504]
## 32952 ATP synthase lipid-binding protein, mitochondrial  [Source:UniProtKB/Swiss-Prot;Acc:Q9BKS0]
## 33639                                                                                            
## 35378                                                                                            
## 60449                                                                                            
##        ext_gene entrezgene_id
## 259       aly-3        178157
## 3564      iff-1        176373
## 5885      pas-7        174571
## 6706      rbx-1        179358
## 6970     rpl-27        171750
## 7059      rps-9        174699
## 7073     rps-17        172313
## 7147      rsp-3        176688
## 7153      rsp-6        177566
## 7515      snr-6        176668
## 9582      tbh-1        181639
## 9925     ubc-18        175985
## 17562    ttr-15        176448
## 17729    cwc-15        180069
## 22411     cpg-2        175991
## 24450   C37A2.7        172401
## 32952 Y82E9BR.3        175299
## 33639   ZK813.3        180634
## 35378   C27H5.9            NA
## 60449  K02F2.11            NA
quantile(sleuth_table_txn$mean_obs, na.rm=TRUE)
##         0%        25%        50%        75%       100% 
##  0.7675163  3.5198016  5.7369968  7.8353703 14.8657162
quantile(sleuth_table_txn$var_obs, na.rm=TRUE)
##           0%          25%          50%          75%         100% 
##  0.001561586  0.071094574  0.203265678  0.902161600 27.826633120
filter(sleuth_table_txn, target_id %in% rownames(p.tpm$loadings)[order(abs(p.tpm$loadings[,1]))][1:20])
##          ens_gene   target_id
## 1  WBGene00219908    F20C5.10
## 2  WBGene00019880   R05D3.6.1
## 3  WBGene00022582   ZC262.5.1
## 4  WBGene00016655   C44E4.6.3
## 5  WBGene00007235  C01G10.8.1
## 6  WBGene00001168   F31E3.5.1
## 7  WBGene00001243  F41H10.7.1
## 8  WBGene00000479   C07H6.5.1
## 9  WBGene00001244  F41H10.8.1
## 10 WBGene00045153    F11A10.9
## 11 WBGene00004427  K11H12.2.1
## 12 WBGene00012179   W01D2.1.1
## 13 WBGene00000465  C07G2.1a.1
## 14 WBGene00004410 Y62E10A.1.2
## 15 WBGene00023020     F30H5.4
## 16 WBGene00023071    F56C9.12
## 17 WBGene00220165   Y47G6A.36
## 18 WBGene00014633     ZK897.4
## 19 WBGene00045139    F01D5.12
## 20 WBGene00045141   Y53F4B.47
##                                                                                          description
## 1                                                                                                   
## 2     Putative ATP synthase subunit epsilon, mitochondrial  [Source:UniProtKB/Swiss-Prot;Acc:P34539]
## 3     Putative ATP synthase subunit epsilon, mitochondrial  [Source:UniProtKB/Swiss-Prot;Acc:P34539]
## 4                       Acyl-CoA-binding protein homolog 1  [Source:UniProtKB/Swiss-Prot;Acc:O01805]
## 5  Activator of HSp90 (HSP90) Atpase (ATPase) activity homolog  [Source:UniProtKB/TrEMBL;Acc:Q93168]
## 6                                Elongation factor 1-alpha  [Source:UniProtKB/Swiss-Prot;Acc:P53013]
## 7      Elongation of very long chain fatty acids protein 5  [Source:UniProtKB/Swiss-Prot;Acc:Q20300]
## 8                         ATP-dependent RNA helicase cgh-1  [Source:UniProtKB/Swiss-Prot;Acc:Q95YF3]
## 9      Elongation of very long chain fatty acids protein 6  [Source:UniProtKB/Swiss-Prot;Acc:Q20303]
## 10                                                                                                  
## 11                               60S ribosomal protein L15  [Source:UniProtKB/Swiss-Prot;Acc:P91374]
## 12                                       Ribosomal protein L37  [Source:UniProtKB/TrEMBL;Acc:O62388]
## 13                              Chondroitin proteoglycan 1  [Source:UniProtKB/Swiss-Prot;Acc:Q17802]
## 14               Ribosomal protein, Large subunit, Acidic (P1)  [Source:UniProtKB/TrEMBL;Acc:Q9U1X9]
## 15                                                                                                  
## 16                                                                                                  
## 17                                                                                                  
## 18                                                                                                  
## 19                                                                                                  
## 20                                                                                                  
##     ext_gene entrezgene_id       pval      qval           b      se_b  mean_obs
## 1   F20C5.10          <NA> 0.05068472 0.2000746 -1.58155828 0.8093374  3.098760
## 2    R05D3.6        176178 0.12148685 0.3552581  0.34342551 0.2217705  5.483882
## 3    ZC262.5        191132 0.12148685 0.3552581  0.34342551 0.2217705  5.483882
## 4     acbp-1        172071 0.12811055 0.3668523  0.25467582 0.1673742  9.300039
## 5     ahsa-1        180017 0.14879226 0.4021858 -0.44276811 0.3066663  9.764503
## 6   eef-1A.1        175975 0.24474243 0.5357922 -0.41427756 0.3561489 10.715867
## 7      elo-5        177320 0.24937890 0.5412788 -0.24436169 0.2121454  9.587738
## 8      cgh-1        176061 0.25097970 0.5427560 -0.19436809 0.1693142 12.214112
## 9      elo-6        177321 0.35878858 0.6561640 -0.15603208 0.1700296 10.155169
## 10  F11A10.9          <NA> 0.56570210 0.8073614  0.21924588 0.3817003  4.591381
## 11    rpl-15        176891 0.60974626 0.8334502  0.07469306 0.1463320 12.282650
## 12   W01D2.1        175141 0.64650201 0.8536850 -0.14415613 0.3143210  9.719753
## 13     cpg-1        175586 0.67184603 0.8671536 -0.09196471 0.2170946 12.018032
## 14     rla-2        178297 0.67569946 0.8698780  0.51812703 1.2385346  6.641657
## 15   F30H5.4          <NA> 0.72149173 0.8934251  0.06515018 0.1827670  6.214460
## 16  F56C9.12          <NA> 0.82874076 0.9422149 -0.18572648 0.8585853  2.861803
## 17 Y47G6A.36          <NA> 0.88381951 0.9641391  0.06355846 0.4349474  4.351655
## 18   ZK897.4          <NA>         NA        NA          NA        NA        NA
## 19  F01D5.12          <NA>         NA        NA          NA        NA        NA
## 20 Y53F4B.47          <NA>         NA        NA          NA        NA        NA
##       var_obs     tech_var     sigma_sq smooth_sigma_sq final_sigma_sq
## 1  1.58019661 0.3868862708  0.622900911      0.92316793     0.92316793
## 2  0.09462516 0.0214216121  0.049660719      0.07694267     0.07694267
## 3  0.09462516 0.0214216121  0.049660719      0.07694267     0.07694267
## 4  0.06655559 0.0060546863  0.049973580      0.01903660     0.04997358
## 5  0.21723116 0.0562966440  0.131791839      0.01991235     0.13179184
## 6  0.26647941 0.1259864136  0.127697596      0.02409404     0.12769760
## 7  0.09421336 0.0383721861  0.051639188      0.01949982     0.05163919
## 8  0.05993791 0.0003430060  0.056991566      0.04112017     0.05699157
## 9  0.05651613 0.0018706787  0.055949467      0.02119982     0.05594947
## 10 0.08722494 0.0902814662 -0.004541950      0.20110883     0.20110883
## 11 0.03734597 0.0004484745  0.041262143      0.04237762     0.04237762
## 12 0.10112001 0.1777970473 -0.066750700      0.01979837     0.01979837
## 13 0.08321084 0.0005755057  0.093684635      0.03781595     0.09368463
## 14 2.70636107 1.0639896436  2.003946395      0.02858724     2.00394639
## 15 0.03039315 0.0282369105  0.005806920      0.03857061     0.03857061
## 16 0.50842750 0.3471005907  0.234566722      1.12723668     1.12723668
## 17 0.09079038 0.1132386610 -0.008663105      0.26511991     0.26511991
## 18         NA           NA           NA              NA             NA
## 19         NA           NA           NA              NA             NA
## 20         NA           NA           NA              NA             NA
#repeating the plot without removing any transcripts makes no real difference
p.tpm0<- pca(sleuth_matrix.tpm, metadata=coldata, removeVar=0)
## -- removing the lower 0% of variables based on variance
dim(p.tpm0$loadings) #should give about 600 transcripts
## [1] 61428     8
biplot(p.tpm0, colby="condition", labSize=4.0, widthConnectors=0.2, title="Kallisto TPM; all transcripts included")

## WHAT HAPPENS IF WE LOG THE VALUES FOR est_counts? 
#finally, repeat the PCA but this time using log2-transformed values for the est_counts
p<- pca(log2(sleuth_matrix+0.5), metadata=coldata, removeVar=0.99)
## -- removing the lower 99% of variables based on variance
dim(p$loadings) #should give about 600 transcripts
## [1] 614   8
#this is essentially the same PCA as by doing: prcomp(t(sleuth_matrix)), scale=FALSE, center=TRUE)

screeplot(p)

#in this case the PC2 separates the samples
biplot(p, colby="condition", labSize=4.0, widthConnectors=0.2, title="Kallisto log2(est_counts)")

pairsplot(p, colby= "condition")
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

plotloadings(p)
## -- variables retained:
## R03G5.1a.1, Y66D12A.15.3, Y113G7B.23c.1, C26H9A.2c.1, ZK1307.8.4, C07A12.4a.2, Y49A3A.2.2, Y66H1B.2c.3, F36A2.1a.2, F25C8.3e.1, F22D6.5b.1, F10D11.1.2, F48E8.1a.1, C35C5.1.1, Y75B8A.22.1, F53F4.14.1, F13B9.5.1, M03A8.2.2, C25A11.4l.1, C46F11.2b.1, F26H11.5.1, Y110A7A.1b.1, C34E10.5.3, F52G2.1c.1, Y79H2A.3f.1, C25A1.10a.2, Y45F10A.6g.1, F18C12.2a.1

#get the top 20 for PC1 and PC2
rownames(p$loadings)[order(abs(p$loadings[,1]))][1:20]
##  [1] "T19B10.3.2"   "Y48A6B.11b.1" "Y51F10.2a.1"  "F28F8.6.2"    "ZK524.2h.1"  
##  [6] "F35C8.7c.2"   "Y105E8B.8a.3" "F55D10.1.2"   "C52E12.2b.1"  "C23H4.1.4"   
## [11] "T24A11.1b.1"  "C05D9.1.1"    "B0350.2c.1"   "B0414.7b.1"   "T11G6.2a.1"  
## [16] "F54E7.3c.1"   "F56A6.2a.1"   "F13B9.5.2"    "F33D11.9d.1"  "Y110A7A.1b.1"
#or getting all the info about these transcripts
t2g[t2g$target_id %in% rownames(p$loadings)[order(abs(p$loadings[,1]))][1:20],]
##             ens_gene    target_id
## 545   WBGene00000277    C23H4.1.4
## 2258  WBGene00001334 Y105E8B.8a.3
## 3179  WBGene00001833 Y110A7A.1b.1
## 3505  WBGene00002040   F56A6.2a.1
## 3912  WBGene00002239    F13B9.5.2
## 5049  WBGene00003472   B0414.7b.1
## 5054  WBGene00003476  T24A11.1b.1
## 5853  WBGene00003918   F54E7.3c.1
## 7531  WBGene00004927    C05D9.1.1
## 9411  WBGene00006446    F28F8.6.2
## 10041 WBGene00006752   ZK524.2h.1
## 10225 WBGene00006780   B0350.2c.1
## 10474 WBGene00006831  C52E12.2b.1
## 17778 WBGene00011717   T11G6.2a.1
## 17972 WBGene00011832   T19B10.3.2
## 19655 WBGene00012972 Y48A6B.11b.1
## 26594 WBGene00018006  F33D11.9d.1
## 26640 WBGene00018037   F35C8.7c.2
## 27842 WBGene00018877   F55D10.1.2
## 32055 WBGene00021763  Y51F10.2a.1
##                                                                                         description
## 545                                                                                                
## 2258                Endoplasmic reticulum oxidoreductin-1  [Source:UniProtKB/Swiss-Prot;Acc:Q7YTU4]
## 3179                   HoloCentric chromosome binding Protein  [Source:UniProtKB/TrEMBL;Acc:Q95Y84]
## 3505                       Heavy chain, Unconventional Myosin  [Source:UniProtKB/TrEMBL;Acc:H2L082]
## 3912                           Kinase suppressor of Ras A  [Source:UniProtKB/Swiss-Prot;Acc:G5EFD2]
## 5049                                       MTK1/MEKK4 homolog  [Source:UniProtKB/TrEMBL;Acc:O01837]
## 5054                       Myotubularin-related protein 3  [Source:UniProtKB/Swiss-Prot;Acc:Q22712]
## 5853                     Partitioning defective protein 3  [Source:UniProtKB/Swiss-Prot;Acc:Q17353]
## 7531                                            Sorting NeXin  [Source:UniProtKB/TrEMBL;Acc:Q9GYJ9]
## 9411                                     Ataxin-3 homolog  [Source:UniProtKB/Swiss-Prot;Acc:O17850]
## 10041 Phorbol ester/diacylglycerol-binding protein unc-13  [Source:UniProtKB/Swiss-Prot;Acc:P27715]
## 10225                                            AO49 ankyrin  [Source:UniProtKB/TrEMBL;Acc:Q7JP86]
## 10474                        Kinesin-like protein unc-104  [Source:UniProtKB/Swiss-Prot;Acc:P23678]
## 17778                                                                                              
## 17972                                      Beta-galactosidase  [Source:UniProtKB/TrEMBL;Acc:Q27526]
## 19655             Regulator of spindle assembly protein 2  [Source:UniProtKB/Swiss-Prot;Acc:G5EFV3]
## 26594                                                                                              
## 26640                  Choline transporter-like protein 1  [Source:UniProtKB/Swiss-Prot;Acc:Q20026]
## 27842                                       Alpha-mannosidase  [Source:UniProtKB/TrEMBL;Acc:Q20829]
## 32055                                                                                              
##       ext_gene entrezgene_id
## 545      cab-1        181382
## 2258     ero-1        173325
## 3179     hcp-6        266830
## 3505     hum-7        171650
## 3912     ksr-1        181082
## 5049     mtk-1        172246
## 5054     mtm-3        175490
## 5853     par-3        175783
## 7531     snx-1        180468
## 9411     atx-3        180049
## 10041   unc-13        172497
## 10225   unc-44        177366
## 10474  unc-104        174144
## 17778  T11G6.2        177977
## 17972   bgal-1        188583
## 19655    rsa-2        176536
## 26594    hpo-3        172256
## 26640   chtl-1        180809
## 27842   aman-1        180749
## 32055 Y51F10.2        245998
rownames(p$loadings)[order(abs(p$loadings[,2]))][1:20]
##  [1] "F46F2.2a.1"  "F35C8.7a.2"  "F18C12.2a.1" "Y79H2A.3f.1" "B0218.1.4"  
##  [6] "M03D4.1d.1"  "T24B8.7g.1"  "C14B9.6c.3"  "F29G6.3a.2"  "R13H4.4d.1" 
## [11] "R12B2.1a.2"  "T21D12.9a.1" "R09B5.11.1"  "R06F6.9.1"   "F32D1.5b.3" 
## [16] "C13G3.3b.1"  "R11E3.8a.2"  "ZK1307.8.2"  "C43G2.1a.2"  "F32D1.1a.1"
t2g[t2g$target_id %in% rownames(p$loadings)[order(abs(p$loadings[,2]))][1:20],]
##             ens_gene   target_id
## 1733  WBGene00001058  R11E3.8a.2
## 1904  WBGene00001153   R06F6.9.1
## 2776  WBGene00001565  C14B9.6c.3
## 3358  WBGene00001978  R13H4.4d.1
## 3827  WBGene00002203  F46F2.2a.1
## 6833  WBGene00004378 F18C12.2a.1
## 7414  WBGene00004858  R12B2.1a.2
## 10736 WBGene00006974  M03D4.1d.1
## 11623 WBGene00007554  C13G3.3b.1
## 14169 WBGene00009259  F29G6.3a.2
## 18225 WBGene00011980  T24B8.7g.1
## 20578 WBGene00013580 Y79H2A.3f.1
## 21512 WBGene00014249  ZK1307.8.2
## 22336 WBGene00015047   B0218.1.4
## 24635 WBGene00016610  C43G2.1a.2
## 26541 WBGene00017981  F32D1.1a.1
## 26553 WBGene00017984  F32D1.5b.3
## 26637 WBGene00018037  F35C8.7a.2
## 29422 WBGene00019979  R09B5.11.1
## 30301 WBGene00020649 T21D12.9a.1
##                                                                                                         description
## 1733                                    Dipeptidyl Peptidase Four (IV) family  [Source:UniProtKB/TrEMBL;Acc:Q9TYX1]
## 1904                                                      Enoyl-CoA Hydratase  [Source:UniProtKB/TrEMBL;Acc:Q09603]
## 2776                                       Nuclear receptor corepressor 1  [Source:UniProtKB/Swiss-Prot;Acc:P34333]
## 3358                                     Alpha-catenin-like protein hmp-1  [Source:UniProtKB/Swiss-Prot;Acc:P90947]
## 3827                                        Casein kinase I isoform delta  [Source:UniProtKB/Swiss-Prot;Acc:Q20471]
## 6833                 Endocytosis protein RME-8; Receptor Mediated Endocytosis  [Source:UniProtKB/TrEMBL;Acc:G5ED36]
## 7414                                                        Dwarfin sma-4  [Source:UniProtKB/Swiss-Prot;Acc:P45897]
## 10736                                                    Kinesin-like protein  [Source:UniProtKB/TrEMBL;Acc:Q8MQ03]
## 11623              Serine/threonine protein phosphatase 2A regulatory subunit  [Source:UniProtKB/TrEMBL;Acc:A9UJN4]
## 14169                                                                                                              
## 18225                                   Ubiquitin carboxyl-terminal hydrolase  [Source:UniProtKB/TrEMBL;Acc:N1NVC2]
## 20578                                                                                                              
## 21512                                                                                                              
## 22336                                        Fatty acid amide hydrolase 1  [Source:UniProtKB/Swiss-Prot;Acc:Q17449]
## 24635                        Progestin and adipoQ receptor-like protein 1  [Source:UniProtKB/Swiss-Prot;Acc:Q94177]
## 26541                                             Fidgetin-like protein 1  [Source:UniProtKB/Swiss-Prot;Acc:O16299]
## 26553                                                       GMP reductase  [Source:UniProtKB/Swiss-Prot;Acc:O16294]
## 26637                                  Choline transporter-like protein 1  [Source:UniProtKB/Swiss-Prot;Acc:Q20026]
## 29422                             Facilitated glucose transporter homolog  [Source:UniProtKB/Swiss-Prot;Acc:O44616]
## 30301 Leucine-rich repeats and immunoglobulin-like domains protein sma-10  [Source:UniProtKB/Swiss-Prot;Acc:Q965M2]
##       ext_gene entrezgene_id
## 1733     dpf-5        177251
## 1904     ech-4        174665
## 2776     gei-8        176145
## 3358     hmp-1        179624
## 3827    kin-20        181620
## 6833     rme-8        172594
## 7414     sma-4        175815
## 10736    zen-4        177374
## 11623   pptr-2        179526
## 14169   hpo-34        181340
## 18225  T24B8.7        174440
## 20578 Y79H2A.3        176638
## 21512 ZK1307.8        174515
## 22336   faah-1        177613
## 24635   paqr-1        177416
## 26541   figl-1        178829
## 26553  F32D1.5        178834
## 26637   chtl-1        180809
## 29422 R09B5.11        178634
## 30301   sma-10        176849
#NOTE: when log2 is used with est_counts, it will be the lower counts that dominate the PCA as they will have the larger variance due to Poisson noise
quantile(sleuth_table_txn$mean_obs, na.rm=TRUE)
##         0%        25%        50%        75%       100% 
##  0.7675163  3.5198016  5.7369968  7.8353703 14.8657162
quantile(sleuth_table_txn$var_obs, na.rm=TRUE)
##           0%          25%          50%          75%         100% 
##  0.001561586  0.071094574  0.203265678  0.902161600 27.826633120
filter(sleuth_table_txn, target_id %in% rownames(p$loadings)[order(abs(p$loadings[,1]))][1:20])
##          ens_gene    target_id
## 1  WBGene00006780   B0350.2c.1
## 2  WBGene00002239    F13B9.5.2
## 3  WBGene00002040   F56A6.2a.1
## 4  WBGene00018006  F33D11.9d.1
## 5  WBGene00001334 Y105E8B.8a.3
## 6  WBGene00001833 Y110A7A.1b.1
## 7  WBGene00011832   T19B10.3.2
## 8  WBGene00006446    F28F8.6.2
## 9  WBGene00003476  T24A11.1b.1
## 10 WBGene00003472   B0414.7b.1
## 11 WBGene00000277    C23H4.1.4
## 12 WBGene00004927    C05D9.1.1
## 13 WBGene00018877   F55D10.1.2
## 14 WBGene00018037   F35C8.7c.2
## 15 WBGene00012972 Y48A6B.11b.1
## 16 WBGene00003918   F54E7.3c.1
## 17 WBGene00006831  C52E12.2b.1
## 18 WBGene00021763  Y51F10.2a.1
## 19 WBGene00006752   ZK524.2h.1
## 20 WBGene00011717   T11G6.2a.1
##                                                                                      description
## 1                                             AO49 ankyrin  [Source:UniProtKB/TrEMBL;Acc:Q7JP86]
## 2                           Kinase suppressor of Ras A  [Source:UniProtKB/Swiss-Prot;Acc:G5EFD2]
## 3                       Heavy chain, Unconventional Myosin  [Source:UniProtKB/TrEMBL;Acc:H2L082]
## 4                                                                                               
## 5                Endoplasmic reticulum oxidoreductin-1  [Source:UniProtKB/Swiss-Prot;Acc:Q7YTU4]
## 6                   HoloCentric chromosome binding Protein  [Source:UniProtKB/TrEMBL;Acc:Q95Y84]
## 7                                       Beta-galactosidase  [Source:UniProtKB/TrEMBL;Acc:Q27526]
## 8                                     Ataxin-3 homolog  [Source:UniProtKB/Swiss-Prot;Acc:O17850]
## 9                       Myotubularin-related protein 3  [Source:UniProtKB/Swiss-Prot;Acc:Q22712]
## 10                                      MTK1/MEKK4 homolog  [Source:UniProtKB/TrEMBL;Acc:O01837]
## 11                                                                                              
## 12                                           Sorting NeXin  [Source:UniProtKB/TrEMBL;Acc:Q9GYJ9]
## 13                                       Alpha-mannosidase  [Source:UniProtKB/TrEMBL;Acc:Q20829]
## 14                  Choline transporter-like protein 1  [Source:UniProtKB/Swiss-Prot;Acc:Q20026]
## 15             Regulator of spindle assembly protein 2  [Source:UniProtKB/Swiss-Prot;Acc:G5EFV3]
## 16                    Partitioning defective protein 3  [Source:UniProtKB/Swiss-Prot;Acc:Q17353]
## 17                        Kinesin-like protein unc-104  [Source:UniProtKB/Swiss-Prot;Acc:P23678]
## 18                                                                                              
## 19 Phorbol ester/diacylglycerol-binding protein unc-13  [Source:UniProtKB/Swiss-Prot;Acc:P27715]
## 20                                                                                              
##    ext_gene entrezgene_id        pval       qval           b     se_b mean_obs
## 1    unc-44        177366 0.001463698 0.01481936  5.03306203 1.581835 3.414347
## 2     ksr-1        181082 0.105855253 0.32695448 -3.48457088 2.154818 2.935111
## 3     hum-7        171650 0.109653317 0.33470561  3.49763201 2.186357 2.229330
## 4     hpo-3        172256 0.111081842 0.33724694 -3.44013593 2.159059 2.137390
## 5     ero-1        173325 0.111532269 0.33770854  3.53290187 2.220075 2.365086
## 6     hcp-6        266830 0.115536196 0.34571721 -4.15334872 2.639076 2.847424
## 7    bgal-1        188583 0.121276485 0.35502790  3.88599406 2.508002 2.628493
## 8     atx-3        180049 0.327906125 0.62781288 -2.18484553 2.233217 2.283616
## 9     mtm-3        175490 0.382496351 0.67778238  2.24459925 2.570232 3.969596
## 10    mtk-1        172246 0.710478150 0.88816775 -1.09186496 2.941336 3.657801
## 11    cab-1        181382 0.841087031 0.94659276  0.50514161 2.519369 4.326480
## 12    snx-1        180468 0.842984258 0.94717668 -0.59891670 3.023645 2.685658
## 13   aman-1        180749 0.876144898 0.96123011 -0.45787971 2.937799 2.721932
## 14   chtl-1        180809 0.882680362 0.96349770  0.36602444 2.480307 2.693936
## 15    rsa-2        176536 0.906500062 0.97188504  0.30321605 2.581566 2.253511
## 16    par-3        175783 0.925036035 0.97777186 -0.24556365 2.609824 2.188864
## 17  unc-104        174144 0.942914782 0.98376401 -0.18951163 2.646557 2.850742
## 18 Y51F10.2        245998 0.993419412 0.99763789  0.02787542 3.379807 3.377456
## 19   unc-13        172497          NA         NA          NA       NA       NA
## 20  T11G6.2        177977          NA         NA          NA       NA       NA
##      var_obs tech_var  sigma_sq smooth_sigma_sq final_sigma_sq
## 1  11.527123 3.819028  1.185379       0.7017787       1.185379
## 2  11.429052 3.776768  5.509715       1.0604238       5.509715
## 3  11.689821 4.466722  5.093593       1.8505943       5.093593
## 4  11.372503 5.797378  3.525697       1.9790858       3.525697
## 5  12.015368 5.042340  4.815124       1.6723276       4.815124
## 6  16.868179 4.818131  9.111310       1.1407393       9.111310
## 7  15.097545 3.657732  8.922421       1.3628281       8.922421
## 8   9.913455 7.513567  2.460947       1.7777063       2.460947
## 9  12.764227 6.506484  6.705706       0.4068814       6.705706
## 10 15.171686 4.130494 13.172417       0.5606838      13.172417
## 11 10.953854 4.838679  7.855761       0.2729035       7.855761
## 12 15.775221 3.256951 15.027906       1.3018105      15.027906
## 13 14.855325 6.105803 11.155525       1.2642113      11.155525
## 14 10.584430 2.992747  9.311097       1.2931537       9.311097
## 15 11.451100 5.720018  7.608952       1.8178592       7.608952
## 16 11.693536 4.938084  8.684274       1.9063537       8.684274
## 17 12.017571 5.873822  8.134705       1.1376122       8.134705
## 18 19.582673 6.020286 16.825906       0.7248982      16.825906
## 19        NA       NA        NA              NA             NA
## 20        NA       NA        NA              NA             NA
#the distribution of est_counts (not logged) for the important loadings in PC1
quantile(filter(so[["obs_norm"]], target_id %in% rownames(p$loadings)[order(abs(p$loadings[,1]))][1:20])$est_counts)
##           0%          25%          50%          75%         100% 
## 0.000000e+00 8.916271e-05 1.499096e+01 5.880107e+01 2.882697e+02
#the distribution of est_counts (not_logged) across all transcripts
quantile(so[["obs_norm"]]$est_counts)
##           0%          25%          50%          75%         100% 
##     0.000000     0.000000     0.000000     5.034019 36437.874329
#Some additional exploratory analysis using sleuth's own functions
plot_pca(so,
         color_by = 'condition',
         text_labels = TRUE)

#the two types of samples are separated but only just (the SGs cluster together though)
plot_pca(so,
          color_by = 'condition',
          text_labels = TRUE, units="tpm")

plot_pca(so,
         pc_x=2,
         pc_y=3,
         color_by = 'condition',
         text_labels = TRUE)

plot_pca(so,
         pc_x=2,
         pc_y=8,
         color_by = 'condition',
         text_labels = TRUE)

#how much of the variance is accounted for by each PC?
plot_pc_variance(obj = so, units = "est_counts")

plot_pc_variance(obj = so, units = "tpm")

#note PC8 also separates the samples
plot_loadings(so, pc_input=1)

plot_loadings(so, pc_input=2)

plot_loadings(so, pc_input=3)

plot_loadings(so, pc_input=8) 

#note that all influential genes in PCA seem to be very highly expressed (over 1000 tpm)

plot_group_density(so,
                   use_filtered = FALSE,
                   units = "est_counts",
                   trans = "log",
                   grouping = "condition")
## Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: `summarise_()` is deprecated as of dplyr 0.7.0.
## Please use `summarise()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

plot_group_density(so,
                   use_filtered = TRUE,
                   units = "est_counts",
                   trans = "log",
                   grouping = "condition")

plot_sample_heatmap(so)

#check whether there are sex differences
#list of sex determining genes is from: 
#https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2619291/pdf/324.pdf
list.sex_genes <- c("xol-1", "sdc-1", "sdc-2", 
                    "her-1", "tra-2", "tra-3", 
                    "fem-1", "fem-2", "fem-3", "tra-1")

plot_transcript_heatmap(so, 
                        transcripts=dplyr::filter(sleuth_table_txn, 
                                                  ext_gene %in% list.sex_genes)$target_id)

#repeat the plot with est_counts instead of tpm
plot_transcript_heatmap(so, 
                        transcripts=dplyr::filter(sleuth_table_txn, 
                                                  ext_gene %in% list.sex_genes)$target_id,
                        units="est_counts")


#ZK287.8b.1 seems to separate two of the normal samples but it's a transcript that 
#is not used in differential expression due to filtering - below are original counts
#note that basic filter in sleuth filters out any transcripts with fewer than 5 estimated counts
#in 47% of the samples (effectively here half of the samples)
so$obs_raw[so$obs_norm$target_id == "ZK287.8b.1",]
##         target_id est_counts  eff_len len      tpm sample
## 480169 ZK287.8b.1   0.000000 54.84584 196  0.00000  N1_S5
## 480170 ZK287.8b.1   3.000000 51.71815 196 11.69946  N2_S6
## 480171 ZK287.8b.1   3.825966 52.92062 196 16.27609  N3_S7
## 480172 ZK287.8b.1   0.000000 51.52067 196  0.00000  N4_S8
## 480173 ZK287.8b.1   0.000000 55.92125 196  0.00000 SG1_S1
## 480174 ZK287.8b.1   0.000000 53.04100 196  0.00000 SG2_S2
## 480175 ZK287.8b.1   0.000000 54.34783 196  0.00000 SG3_S3
## 480176 ZK287.8b.1   0.000000 53.99656 196  0.00000 SG4_S4

A note on the variability of PCA plots We have plotted a number of PCA plots in the code above. Most look similar but there are also some striking differences. What drives most of the differences is that some of the plots have used log-transformed values of counts and others haven't. The est_counts in the so object's obs_norm and obs_raw data frames are not log normalised (however when they are shown in the results table, the mean observations are indeed logged (in this case transformed by log2 as this is how we built the object)). Hence, when we are using est_counts extracted with sleuth_to_matrix, these counts are not in a log scale. Some are indeed very large and will dominate the PCA plot (they also have very large variance in RNA-seq). If we log these values, then the lower counts will dominate the PCA as they will have the larger variance due to Poisson noise.

Functional analysis

Now that we have a list of significant differentially expressed genes (DGEs), we can try to gain biological insight by visualising our results in a proper context. We will use mainly the library clusterProfiler for this, which has a variety of options for viewing the over-represented Gene Ontology (GO) terms. The layout of this part of the code is largely based on the tutorial found here.

Quick GO over-representation analysis using goseq

Below we apply goseq to find overrepresented GO categories among differentially expressed genes. We will use the genes from the aggregated p-value sleuth run.

#collect the results of differential expression
#first the background list of genes (this will all be filtered genes in the analysis i.e. genes with no NAs in their qvalues)
bg <- dplyr::filter(sleuth_table_wt, !is.na(qval))$ext_gene  #there should be 11337 genes in the background

#then select significant genes with qval <= 0.05
sig.q005 <- (dplyr::filter(sleuth_table_wt, qval <= 0.05))$ext_gene  #2328 genes

t.q005.vector <- as.integer(bg %in% sig.q005)  #vector that specifies which genes are in the DE set and which not (length 11337)
names(t.q005.vector) <- dplyr::filter(sleuth_table_wt, !is.na(qval))$ext_gene
#sanity check
table(t.q005.vector)
## t.q005.vector
##    0    1 
## 9009 2328
#calculate the object containing the gene names, DE calls and probability weighting function.
pwf.q005=nullp(t.q005.vector,"ce6","geneSymbol")
## Loading ce6 length data...
## Warning in pcls(G): initial point very close to some inequality constraints

subset(pwf.q005, !is.na(pwf.q005$pwf)) -> pwf.q005.noNA
q005.GO.wall.BP=goseq(pwf.q005.noNA,"ce6","geneSymbol", test.cats = c("GO:BP"))  
## Fetching GO annotations...
## For 2362 genes, we could not find any categories. These genes will be excluded.
## To force their use, please run with use_genes_without_cat=TRUE (see documentation).
## This was the default behavior for version 1.15.1 and earlier.
## Calculating the p-values...
## 'select()' returned 1:1 mapping between keys and columns
#q005.GO.wall.BP[(p.adjust(q005.GO.wall.BP$over_represented_pvalue,method="BH")<.05),c(2,3,6)]

q005.GO.wall.MF=goseq(pwf.q005,"ce6","geneSymbol", test.cats = c("GO:MF"))
## Fetching GO annotations...
## For 3017 genes, we could not find any categories. These genes will be excluded.
## To force their use, please run with use_genes_without_cat=TRUE (see documentation).
## This was the default behavior for version 1.15.1 and earlier.
## Calculating the p-values...
## 'select()' returned 1:1 mapping between keys and columns
#q005.GO.wall.MF[(p.adjust(q005.GO.wall.MF$over_represented_pvalue,method="BH")<.05),c(2,3,6)]

q005.GO.wall.CC=goseq(pwf.q005,"ce6","geneSymbol", test.cats = c("GO:CC"))
## Fetching GO annotations...
## For 3017 genes, we could not find any categories. These genes will be excluded.
## To force their use, please run with use_genes_without_cat=TRUE (see documentation).
## This was the default behavior for version 1.15.1 and earlier.
## Calculating the p-values...
## 'select()' returned 1:1 mapping between keys and columns
#q005.GO.wall.CC[(p.adjust(q005.GO.wall.CC$over_represented_pvalue,method="BH")<.05),c(2,3,6)]


#filter for FDR<0.05 and min/max number of genes in each category
q005.GO.wall.BP.filtered <- q005.GO.wall.BP
q005.GO.wall.BP.filtered$adjp <- p.adjust(q005.GO.wall.BP$over_represented_pvalue,method="BH")
q005.GO.wall.BP.filtered <- dplyr::filter(q005.GO.wall.BP.filtered, adjp < 0.05)
q005.GO.wall.BP.filtered <- dplyr::filter(q005.GO.wall.BP.filtered, (numInCat < 500) & (numInCat>5)) 
q005.GO.wall.BP.filtered
##      category over_represented_pvalue under_represented_pvalue numDEInCat
## 1  GO:0009057            1.683530e-08                1.0000000        112
## 2  GO:0044265            4.939462e-08                1.0000000        103
## 3  GO:0006396            1.224281e-07                1.0000000        100
## 4  GO:0016071            1.232634e-07                1.0000000         73
## 5  GO:0000375            2.300860e-06                0.9999992         37
## 6  GO:0000377            2.300860e-06                0.9999992         37
## 7  GO:0000398            2.300860e-06                0.9999992         37
## 8  GO:0006397            5.111194e-06                0.9999979         49
## 9  GO:0051603            6.058874e-06                0.9999972         71
## 10 GO:0043632            9.956360e-06                0.9999953         63
## 11 GO:0000956            1.530280e-05                0.9999967         20
## 12 GO:0008380            1.690233e-05                0.9999937         38
## 13 GO:0030163            1.690969e-05                0.9999915         77
## 14 GO:1901565            1.771629e-05                0.9999904         96
## 15 GO:0044257            2.591567e-05                0.9999868         71
## 16 GO:0010498            4.831554e-05                0.9999777         51
## 17 GO:0019941            5.768751e-05                0.9999714         59
## 18 GO:0043161            6.131046e-05                0.9999721         48
## 19 GO:0008298            6.395180e-05                1.0000000          6
## 20 GO:0006099            6.446277e-05                0.9999953          9
## 21 GO:0006101            6.446277e-05                0.9999953          9
## 22 GO:0072350            6.446277e-05                0.9999953          9
## 23 GO:0006511            6.686710e-05                0.9999671         57
## 24 GO:0034470            2.257391e-04                0.9998889         49
## 25 GO:0006508            3.042429e-04                0.9998040        115
## 26 GO:0034660            3.362983e-04                0.9998270         52
##    numInCat
## 1       342
## 2       313
## 3       308
## 4       204
## 5        88
## 6        88
## 7        88
## 8       132
## 9       214
## 10      186
## 11       38
## 12       98
## 13      243
## 14      320
## 15      222
## 16      149
## 17      180
## 18      139
## 19        6
## 20       12
## 21       12
## 22       12
## 23      173
## 24      150
## 25      420
## 26      164
##                                                                                    term
## 1                                                       macromolecule catabolic process
## 2                                              cellular macromolecule catabolic process
## 3                                                                        RNA processing
## 4                                                                mRNA metabolic process
## 5                                       RNA splicing, via transesterification reactions
## 6  RNA splicing, via transesterification reactions with bulged adenosine as nucleophile
## 7                                                        mRNA splicing, via spliceosome
## 8                                                                       mRNA processing
## 9                            proteolysis involved in cellular protein catabolic process
## 10                               modification-dependent macromolecule catabolic process
## 11                                           nuclear-transcribed mRNA catabolic process
## 12                                                                         RNA splicing
## 13                                                            protein catabolic process
## 14                                            organonitrogen compound catabolic process
## 15                                                   cellular protein catabolic process
## 16                                                proteasomal protein catabolic process
## 17                                     modification-dependent protein catabolic process
## 18                    proteasome-mediated ubiquitin-dependent protein catabolic process
## 19                                                      intracellular mRNA localization
## 20                                                             tricarboxylic acid cycle
## 21                                                            citrate metabolic process
## 22                                                 tricarboxylic acid metabolic process
## 23                                        ubiquitin-dependent protein catabolic process
## 24                                                                     ncRNA processing
## 25                                                                          proteolysis
## 26                                                              ncRNA metabolic process
##    ontology         adjp
## 1        BP 9.560767e-05
## 2        BP 1.402560e-04
## 3        BP 1.750032e-04
## 4        BP 1.750032e-04
## 5        BP 1.633323e-03
## 6        BP 1.633323e-03
## 7        BP 1.633323e-03
## 8        BP 2.638770e-03
## 9        BP 2.867362e-03
## 10       BP 4.282198e-03
## 11       BP 5.589489e-03
## 12       BP 5.589489e-03
## 13       BP 5.589489e-03
## 14       BP 5.589489e-03
## 15       BP 7.358756e-03
## 16       BP 1.143267e-02
## 17       BP 1.185325e-02
## 18       BP 1.185325e-02
## 19       BP 1.185325e-02
## 20       BP 1.185325e-02
## 21       BP 1.185325e-02
## 22       BP 1.185325e-02
## 23       BP 1.186682e-02
## 24       BP 3.373612e-02
## 25       BP 4.430244e-02
## 26       BP 4.658141e-02
q005.GO.wall.MF.filtered <- q005.GO.wall.MF
q005.GO.wall.MF.filtered$adjp <- p.adjust(q005.GO.wall.MF$over_represented_pvalue,method="BH")
q005.GO.wall.MF.filtered <- dplyr::filter(q005.GO.wall.MF.filtered, adjp < 0.05)
q005.GO.wall.MF.filtered <- dplyr::filter(q005.GO.wall.MF.filtered, (numInCat <500) & (numInCat>5)) 
q005.GO.wall.MF.filtered
## [1] category                 over_represented_pvalue  under_represented_pvalue
## [4] numDEInCat               numInCat                 term                    
## [7] ontology                 adjp                    
## <0 rows> (or 0-length row.names)
q005.GO.wall.CC.filtered <- q005.GO.wall.CC
q005.GO.wall.CC.filtered$adjp <- p.adjust(q005.GO.wall.CC$over_represented_pvalue,method="BH")
q005.GO.wall.CC.filtered <- dplyr::filter(q005.GO.wall.CC.filtered, adjp < 0.05)
q005.GO.wall.CC.filtered <- dplyr::filter(q005.GO.wall.CC.filtered, (numInCat < 500) & (numInCat>5)) 
q005.GO.wall.CC.filtered
##      category over_represented_pvalue under_represented_pvalue numDEInCat
## 1  GO:0005681            3.919986e-09                1.0000000         41
## 2  GO:0000502            1.386358e-05                0.9999968         21
## 3  GO:1905369            3.742007e-05                0.9999905         21
## 4  GO:1905368            4.354055e-05                0.9999871         25
## 5  GO:0000228            6.284587e-05                0.9999684         61
## 6  GO:0098687            6.751741e-05                0.9999735         36
## 7  GO:0000777            8.079771e-05                0.9999811         18
## 8  GO:0000779            9.359660e-05                0.9999754         20
## 9  GO:0000974            1.052725e-04                0.9999967          7
## 10 GO:0071013            1.090325e-04                0.9999721         19
## 11 GO:0005694            1.195907e-04                0.9999254        115
## 12 GO:0044454            3.043036e-04                0.9998469         50
## 13 GO:0005581            3.687851e-04                0.9998547         29
## 14 GO:0044427            4.271121e-04                0.9997356         92
## 15 GO:0000775            4.456871e-04                0.9998306         27
## 16 GO:0005839            4.849177e-04                0.9999395          9
## 17 GO:0120114            4.925507e-04                0.9998328         22
## 18 GO:0005654            5.210810e-04                0.9996823         84
## 19 GO:0005657            5.989602e-04                0.9998583         14
## 20 GO:0000793            7.017511e-04                0.9996487         44
## 21 GO:0071014            1.063306e-03                0.9999005          7
## 22 GO:0000776            1.221421e-03                0.9995499         22
## 23 GO:0005684            1.266295e-03                0.9996635         14
## 24 GO:0043596            1.462030e-03                0.9996913         11
## 25 GO:0044451            1.679169e-03                0.9989979         64
##    numInCat                                     term ontology         adjp
## 1        82                     spliceosomal complex       CC 2.925289e-07
## 2        41                       proteasome complex       CC 8.712164e-04
## 3        43                    endopeptidase complex       CC 2.127599e-03
## 4        55                        peptidase complex       CC 2.363065e-03
## 5       181                       nuclear chromosome       CC 3.262520e-03
## 6        91                       chromosomal region       CC 3.358991e-03
## 7        35         condensed chromosome kinetochore       CC 3.858899e-03
## 8        41 condensed chromosome, centromeric region       CC 4.298244e-03
## 9         8                            Prp19 complex       CC 4.649459e-03
## 10       39             catalytic step 2 spliceosome       CC 4.649459e-03
## 11      396                               chromosome       CC 4.693817e-03
## 12      150                  nuclear chromosome part       CC 1.101026e-02
## 13       76                          collagen trimer       CC 1.295086e-02
## 14      316                         chromosomal part       CC 1.457062e-02
## 15       67           chromosome, centromeric region       CC 1.478196e-02
## 16       14                  proteasome core complex       CC 1.547646e-02
## 17       53           Sm-like protein family complex       CC 1.547646e-02
## 18      285                              nucleoplasm       CC 1.595309e-02
## 19       28                         replication fork       CC 1.787896e-02
## 20      130                     condensed chromosome       CC 2.043636e-02
## 21       10   post-mRNA release spliceosomal complex       CC 3.022827e-02
## 22       54                              kinetochore       CC 3.314493e-02
## 23       30             U2-type spliceosomal complex       CC 3.359904e-02
## 24       21                 nuclear replication fork       CC 3.794923e-02
## 25      214                         nucleoplasm part       CC 4.265803e-02

GO over-representation analysis with clusterProfiler (using gene-level sleuth results)

Much of this code has been adapted from Diego's analysis. We apply the various clusterProfiler functions to explore the data.

# Run GO enrichment analysis and save results 
ego <- enrichGO(gene = sleuth_significant_wt$target_id, 
                universe = dplyr::filter(sleuth_table_wt, !is.na(qval))$target_id,
                keyType = "ENSEMBL",
                OrgDb = org.Ce.eg.db, 
                ont = "ALL",
                pAdjustMethod = "BH", 
                qvalueCutoff = 0.05, 
                readable = TRUE)

cluster_summary <- data.frame(ego)
write.csv(cluster_summary, "results/clusterProfiler_ALL_neurexin.csv")

ego <- enrichGO(gene = sleuth_significant_wt$target_id, 
                universe = dplyr::filter(sleuth_table_wt, !is.na(qval))$target_id,
                keyType = "ENSEMBL",
                OrgDb = org.Ce.eg.db, 
                ont = "BP",
                pAdjustMethod = "BH", 
                qvalueCutoff = 0.05, 
                readable = TRUE)

cluster_summary <- data.frame(ego)
write.csv(cluster_summary, "results/clusterProfiler_BP_neurexin.csv")

#enrichment dotplot
enrichment_dotplot <- clusterProfiler::dotplot(ego, 
                                               x="GeneRatio",
                                               showCategory = 30)
plot(enrichment_dotplot)

ggsave("results/enrichment_dotplot.svg", plot = enrichment_dotplot, path = ".",
       width = 30, height = 15, units = "cm")


#enrichment mapplot
#this plot visualises gene sets as a network 
#(mutually overlapping gene sets tend to cluster together)
clusterProfiler::emapplot(ego,
                          showCategory = 30)

GO over-representation analysis with clusterProfiler (using transcript-level sleuth results)

An additional plot that ClusterProfiler allows us to do is a category netplot. These plots are particularly useful for hypothesis generation in identifying genes that may be important to several of the most affected processes. These plots use fold change but this is not available in gene-level analysis in sleuth. So instead, we'll use transcript-based results and use beta values as the fold changes.

#netplot
# We need to rerun enrichGO with the transcript-based information instead
# This is any way more correct as bias corrections based on length are best done using
# the individual transcript info rather than some mean length for the gene
# first, create a df without duplicate genes (first row is kept and so most important transcript should be kept in the df)
df <- sleuth_significant_txn %>% distinct(ens_gene, .keep_all=TRUE) 
# check it worked
dim(df)
## [1] 2217   15
length(unique(sleuth_significant_txn$ens_gene)) == dim(df)[1]
## [1] TRUE
# now re-run enrichGO using these genes instead 

ego.txn <- enrichGO(gene = df$ens_gene, 
                universe = unique(dplyr::filter(sleuth_table_txn, !is.na(qval))$ens_gene),
                keyType = "ENSEMBL",
                OrgDb = org.Ce.eg.db, 
                ont = "BP",
                pAdjustMethod = "BH", 
                qvalueCutoff = 0.05, 
                readable = TRUE)

#repeat quickly the dotplot and emapplot to compare them to the ones produced from the gene table
#they should be very similar
enrichment_dotplot <- clusterProfiler::dotplot(ego.txn, 
                                               x="GeneRatio",
                                               showCategory = 30)
plot(enrichment_dotplot)

ggsave("results/enrichment_dotplot.svg", plot = enrichment_dotplot, path = ".",
       width = 30, height = 15, units = "cm")

#fig_emapplot
enrichment_mapplot <- clusterProfiler::emapplot(ego.txn,
                                                showCategory = 30)
plot(enrichment_mapplot)

ggsave("results/enrichment_mapplot.svg", plot = enrichment_mapplot, path = ".")
## Saving 7 x 5 in image
# A named character vector is needed to colour genes in the net plot
genes_foldchange <- df$b
names(genes_foldchange) <- df$ext_gene

# Netplot

enrichment_netplot <- cnetplot(ego.txn,
                               categorySize = "pvalue",
                               showCategory = 5,
                               foldChange = genes_foldchange,
                               vertex.label.font = 6, drop=TRUE)
plot(enrichment_netplot)

ggsave("results/enrichment_netplot.svg", plot = enrichment_netplot, path = ".",
       width = 40, height = 40, units = "cm")

#remove the gene names for clarity (showing more categories doesn't help much as they fall on top of each other)
enrichment_netplot <- cnetplot(ego.txn,
          categorySize = "pvalue",
          showCategory = 5,
          foldChange = genes_foldchange,
          vertex.label.font = 6, node_label="category")
plot(enrichment_netplot)

ggsave("results/enrichment_netplot_cleaner.svg", plot = enrichment_netplot, path = ".",
       width = 40, height = 40, units = "cm")

#showing 8 categories
cnetplot(ego.txn,
          categorySize = "pvalue",
          showCategory = 8,
          foldChange = genes_foldchange,
          vertex.label.font = 6, node_label="category")

#relationship between the GO terms identified (haven't found a way to make the writing bigger!)
go_graph <- plotGOgraph(ego.txn)
## 
## groupGOTerms:    GOBPTerm, GOMFTerm, GOCCTerm environments built.
## 
## Building most specific GOs .....
##  ( 3902 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 3906 GO terms and 8606 relations. )
## 
## Annotating nodes ...............
##  ( 6518 genes annotated to the GO terms. )

#goplot (relationships between GO categories)
goplot(ego.txn)

#upsetplot (requires enrichplot and ggupset) - number of genes overlapping different data sets
upsetplot(ego.txn)

#As centplots are very crowded, we can focus on subsets of interest
#First, focus on RNA terms
ego.sub <- ego.txn
descriptions <- ego.txn@result$Description
go_term1 <- which(base::grepl("RNA", descriptions) == TRUE)
#go_term2 <- which(base::grepl("splic", descriptions) == TRUE)

#ego.sub@result <- ego.sub@result[c(go_term1, go_term2), ]
ego.sub@result <- ego.sub@result[c(go_term1), ]

# Plotting terms of interest
enrichment_netplot.sub <- cnetplot(ego.sub,
                                categorySize = "pvalue",
                                foldChange = genes_foldchange,
                                showCategory = 5)
plot(enrichment_netplot.sub)

ggsave("enrichment_netplot.sub.svg", plot = enrichment_netplot.sub, path = "./results",
       width = 25, height = 20, units = "cm")

#Next, focus on "catabolic" processes
ego.sub <- ego.txn
descriptions <- ego.txn@result$Description
go_term1 <- which(base::grepl("catabolic", descriptions) == TRUE)

ego.sub@result <- ego.sub@result[c(go_term1), ]

enrichment_netplot.sub2 <- cnetplot(ego.sub,
                                categorySize = "pvalue",
                                foldChange = genes_foldchange,
                                showCategory = 5)
plot(enrichment_netplot.sub2)

ggsave("enrichment_netplot.sub2.svg", plot = enrichment_netplot.sub2, path = "./results",
       width = 25, height = 20, units = "cm")

#Next, focus on chromosome
ego.sub <- ego.txn
descriptions <- ego.txn@result$Description
go_term1 <- which(base::grepl("chromosom", descriptions) == TRUE)

ego.sub@result <- ego.sub@result[c(go_term1), ]

enrichment_netplot.sub3 <- cnetplot(ego.sub,
                                categorySize = "pvalue",
                                foldChange = genes_foldchange,
                                showCategory = 5)
plot(enrichment_netplot.sub3)

ggsave("enrichment_netplot.sub3.svg", plot = enrichment_netplot.sub3, path = "./results",
       width = 25, height = 20, units = "cm")

#put all the plots together using cowplot
cowplot::plot_grid(enrichment_netplot, enrichment_netplot.sub, enrichment_netplot.sub2, enrichment_netplot.sub3, ncol=2, labels=LETTERS[1:4])

#Finally, repeat using ALL categories - molecular function gives no results (when run on its
#own gives a few non-significant hits)
ego.txn <- enrichGO(gene = df$ens_gene, 
                universe = unique(dplyr::filter(sleuth_table_txn, !is.na(qval))$ens_gene),
                keyType = "ENSEMBL",
                OrgDb = org.Ce.eg.db, 
                ont = "ALL",
                pAdjustMethod = "BH", 
                qvalueCutoff = 0.05, 
                readable = TRUE)

enrichment_dotplot <- clusterProfiler::dotplot(ego.txn, 
                                               x="GeneRatio",
                                               showCategory = 30)
plot(enrichment_dotplot)

ggsave("results/enrichment_dotplot_ALL.svg", plot = enrichment_dotplot, path = ".",
       width = 30, height = 15, units = "cm")

GSEA analysis with clusterProfiler (using transcript-level sleuth results)

Analysis adapted from the original done by Diego

GSEA methods suggest that weaker but coordinated changes in sets of functionally related genes (i.e., pathways) can also have significant effects and might not be picked up by analysis of lists of genes that have been selected using what is essentially a random p-value cutoff. Thus, rather than setting an arbitrary threshold to identify ‘significant genes’, all genes are considered in the GSEA analysis.

To perform Gene Set Enrichment Analysis (GSEA) of KEGG gene sets, clusterProfiler requires the genes to be identified using Entrez IDs for all genes in our results dataset. We also need to remove the NA values and duplicates (due to gene ID conversion) prior to the analysis.

See also this blog: https://learn.gencore.bio.nyu.edu/rna-seq-analysis/over-representation-analysis/

** DO NOT RUN GSEA YET **

KEGG pathway enrichment with clusterProfiler

#check the kegg code for c.elegans
search_kegg_organism('cel', by='kegg_code')
##      kegg_code              scientific_name common_name
## 118        cel       Caenorhabditis elegans    nematode
## 3022      ccel        Clostridium cellulosi        <NA>
## 4202      bcel Bacteroides cellulosilyticus        <NA>
#entrez ids not recognised in this case
sigGenes <- sleuth_significant_txn %>% distinct(ext_gene, .keep_all=FALSE)
sigGenes[,1] <- paste("CELE", sigGenes[,1],sep = "_")
dim(sigGenes)
## [1] 2217    1
universe <- sleuth_table_txn %>% filter(!is.na(ext_gene) & !is.na(qval)) %>% distinct(ext_gene, .keep_all=FALSE) 
universe[,1] <- paste("CELE", universe[,1], sep="_")
dim(universe)
## [1] 11337     1
keg_res <- enrichKEGG(gene=sigGenes[,1], 
                      organism="cel",
                      keyType="kegg",
                      universe= universe[,1],
                      minGSSize = 2, #default is 10 
                      qvalueCutoff = 0.1) 
head(keg_res)
## [1] ID          Description GeneRatio   BgRatio     pvalue      p.adjust   
## [7] qvalue      geneID      Count      
## <0 rows> (or 0-length row.names)
#this does not appear to return any results, even if the minGSSize is set to 1 and the qvalue cutoff further relaxed.

#however, setting the universe to all C. elegans genes in KEGG returns three
#pathways
keg_res <- enrichKEGG(gene=sigGenes[,1], 
                      organism="cel",
                      keyType="kegg",
                      minGSSize = 2, #default is 10 
                      qvalueCutoff = 0.1) 
head(keg_res)
##                ID                       Description GeneRatio  BgRatio
## cel03008 cel03008 Ribosome biogenesis in eukaryotes     13/95  85/3494
## cel03040 cel03040                       Spliceosome     10/95 117/3494
## cel00240 cel00240             Pyrimidine metabolism      5/95  35/3494
##                pvalue     p.adjust       qvalue
## cel03008 2.863849e-07 1.661032e-05 1.386706e-05
## cel03040 1.102316e-03 3.196716e-02 2.668764e-02
## cel00240 2.273984e-03 4.396370e-02 3.670291e-02
##                                                                                                                                                                                   geneID
## cel03008 CELE_Y53C12B.1/CELE_C05C8.5/CELE_F55F10.1/CELE_C05D11.9/CELE_T25G3.3/CELE_ZK795.3/CELE_C53H9.2/CELE_B0280.9/CELE_Y39A1A.14/CELE_ZK632.14/CELE_C08B6.8/CELE_ZC434.4/CELE_C15C6.4
## cel03040                                   CELE_F19F10.9/CELE_T13H5.4/CELE_Y69A2AR.21/CELE_F32B6.3/CELE_M01E11.2/CELE_F33D11.10/CELE_M03F8.3/CELE_Y54G2A.75/CELE_Y108G3AL.2/CELE_F08B4.7
## cel00240                                                                                                             CELE_F40F8.1/CELE_K02D7.1/CELE_Y10G11A.1/CELE_ZK643.2/CELE_T24C12.3
##          Count
## cel03008    13
## cel03040    10
## cel00240     5

Visualisation of KEGG pathways with pathview

We can use the pathview function for visualising individual kegg pathways. The pathview function requires a numeric vector with gene IDs as names. It looks like the IDs it accepts in this case are Entrez ids? The colours in the following plots indicate low (green) or high (red) beta values for any transcript included in the analysis (can be significant or not). Note: unlikely to be that useful here as KEGG enrichment returned no results; but equivalent with reactome or something else might be useful

#pathview can be used to show which genes are significantly differentially expressed or log fold changes 
#can be added to the vector provided to give a range of colours on the plot


#sigGenes <- sleuth_significant_txn %>% dplyr::distinct(ext_gene, .keep_all=TRUE) %>% dplyr::select(ext_gene, b)
#keggGenes <- sigGenes$b
#names(keggGenes) <- paste("CELE", sigGenes[,1], sep="_")

#t <- sleuth_table_txn %>% filter(!is.na(ext_gene) & !is.na(qval)) %>% dplyr::distinct(entrezgene_id, .keep_all=TRUE) %>% dplyr::select(entrezgene_id, b)
#keggGenes <- t$b
#names(keggGenes) <- t$entrezgene_id

t <- sleuth_table_txn %>% filter(!is.na(ext_gene) & !is.na(qval)) %>% dplyr::distinct(ext_gene, .keep_all=TRUE) %>% dplyr::select(ext_gene, b)
keggGenes <- t$b
names(keggGenes) <- paste("CELE", t$ext_gene, sep="_")

#In the images below, I use a permissive 0.5 value for beta otherwise many
#genes would come out as grey
#NOTE: these can be misleading for various reasons: a) the differences
#may not be significant (I'm not using the significant only genes)
#b) I've only kept one isoform from each gene so maybe one isoform goes up but
#øthers go down...

# Output image for ribosome biogenesis KEGG pathway: "cel03008"
pv.out <- pathview(gene.data = keggGenes,
         pathway.id = "cel03008",
         species = "cel",
         gene.idtype = "kegg",  #default is entrez ids
         #gene.idtype = "entrez",
         #limit = list(gene = c(round(min(keggGenes)),
          #                     round(max(keggGenes))),
          #            cpd = 1),
         limit = list(gene = c(-0.5,
                               0.5),
                      cpd = 1),
         plot.col.key = TRUE,
         kegg.native=T,
         kegg.dir = "./results/")

# Output image for spliceosome KEGG pathway: "cel03040"
pv.out <- pathview(gene.data = keggGenes,
         pathway.id = "cel03040",
         species = "cel",
         gene.idtype = "kegg",  #default is entrez ids
         #gene.idtype = "entrez",
         #limit = list(gene = c(round(min(keggGenes)),
          #                     round(max(keggGenes))),
          #            cpd = 1),
         limit = list(gene = c(-0.5,
                               0.5),
                      cpd = 1),
         plot.col.key = TRUE,
         kegg.native=T,
         kegg.dir = "./results/")


# Output image for fatty acid elongation in mitochondria KEGG pathway: "cel00062"
pv.out <- pathview(gene.data = keggGenes,
         pathway.id = "cel00062",
         species = "cel",
         gene.idtype = "kegg",  #default is entrez ids
         #gene.idtype = "entrez",
         #limit = list(gene = c(round(min(keggGenes)),
          #                     round(max(keggGenes))),
          #            cpd = 1),
         limit = list(gene = c(-0.5,
                               0.5),
                      cpd = 1),
         plot.col.key = TRUE,
         kegg.native=T,
         kegg.dir = "./results/")

# Output image for  TCA KEGG pathway: "cel00020"
pv.out <- pathview(gene.data = keggGenes,
         pathway.id = "cel00020",
         species = "cel",
         #gene.idtype = "entrez",
         gene.idtype = "kegg",  #default is entrez ids
         #limit = list(gene = c(round(min(keggGenes)),
          #                     round(max(keggGenes))),
           #           cpd = 1),
         limit = list(gene = c(-0.5,
                               0.5),
                      cpd = 1),
         kegg.native=T, 
         kegg.dir = "./results/")

# Output image for  lysosome KEGG pathway: "cel04142"
pv.out <- pathview(gene.data = keggGenes,
         pathway.id = "cel04142",
         species = "cel",
         #gene.idtype = "entrez",
         gene.idtype = "kegg",  #default is entrez ids
         #limit = list(gene = c(round(min(keggGenes)),
          #                     round(max(keggGenes))),
           #           cpd = 1),
         limit = list(gene = c(-0.5,
                               0.5),
                      cpd = 1),
         both.dirs=list(gene=TRUE, cpd=FALSE),
         kegg.native=T, 
         kegg.dir = "./results/")

#tidy up (pathview prints out the annotated pathway pics in the current directory 
#but puts the original ones in the kegg.dir)
pathview_dir <- "results/pathview_kegg"
ifelse(!dir.exists(file.path("./", pathview_dir)), dir.create(file.path("./", pathview_dir)), FALSE)
## [1] FALSE
files_to_move<- list.files("./", pattern="*.pathview.png")
file.rename(files_to_move, paste(pathview_dir, files_to_move, sep="/"))
## [1] TRUE TRUE TRUE TRUE TRUE
pathview_dir <- "results/pathview_kegg"
files_to_move <- list.files("./results", pattern="cel*[png|xml]", full.names=F)
file.rename(paste("results/", files_to_move,sep="/"), paste(pathview_dir, files_to_move, sep="/"))
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Functional enrichment analysis with gProfiler (using transcript-level sleuth results)

#code follows: https://cran.r-project.org/web/packages/gprofiler2/vignettes/gprofiler2.html#gene-list-functional-enrichment-analysis-with-gost
#Analysis with gProfiler can also be done online at: 
#install.packages("gprofiler2")
#install.packages("treemap")
library(treemap)
library(gprofiler2)

#use genes rather than transcripts
df <- sleuth_significant_txn %>% distinct(ens_gene, .keep_all=TRUE) 
# check it worked
dim(df)
## [1] 2217   15
length(unique(sleuth_significant_txn$ens_gene)) == dim(df)[1]
## [1] TRUE
gostres <- gost(query = df$ext_gene, 
                                  organism = "celegans",
                                  ordered_query = F, 
                                  exclude_iea = T,   #exclude computational GO annotations 
                                  significant=TRUE, 
                                  evcodes = TRUE, #evidence codes for the results
                                  correction_method = "fdr", #tailor-made algorithm provided by gProfiler
                                  custom_bg = dplyr::filter(sleuth_table_txn, !is.na(qval))$target_id)
## Detected custom background input, domain scope is set to 'custom'
head(gostres$result)
##     query significant      p_value term_size query_size intersection_size
## 1 query_1        TRUE 1.205155e-07       400       2172               131
## 2 query_1        TRUE 4.284938e-07      5151       2172              1117
## 3 query_1        TRUE 1.049533e-06       358       2172               115
## 4 query_1        TRUE 1.049533e-06      3144       2172               715
## 5 query_1        TRUE 1.049533e-06      1411       2172               356
## 6 query_1        TRUE 1.049533e-06      2930       2172               671
##    precision    recall    term_id source
## 1 0.06031308 0.3275000 GO:0009057  GO:BP
## 2 0.51427256 0.2168511 GO:0008150  GO:BP
## 3 0.05294659 0.3212291 GO:0044265  GO:BP
## 4 0.32918969 0.2274173 GO:0008152  GO:BP
## 5 0.16390424 0.2523033 GO:0006725  GO:BP
## 6 0.30893186 0.2290102 GO:0044237  GO:BP
##                                      term_name effective_domain_size
## 1              macromolecule catabolic process                 11344
## 2                           biological_process                 11344
## 3     cellular macromolecule catabolic process                 11344
## 4                            metabolic process                 11344
## 5 cellular aromatic compound metabolic process                 11344
## 6                   cellular metabolic process                 11344
##   source_order                            parents
## 1         3705             GO:0043170, GO:1901575
## 2         3575                                   
## 3        12868 GO:0009057, GO:0044248, GO:0044260
## 4         3576                         GO:0008150
## 5         2736                         GO:0044237
## 6        12850             GO:0008152, GO:0009987
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           evidence_codes
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    IBA,IMP,IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP,IBA,IBA,IBA,IBA,IMP,IDA IBA,IBA,IBA,IMP IGI,IBA,IBA,IBA,IMP,IBA,IBA,IBA,IBA,IBA,IBA,IMP,IMP IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IMP IGI,IBA,IBA,IDA IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IMP IGI IBA,IBA,IBA,IBA,IBA,IBA,IBA,ISS,IBA,IMP IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IMP,IBA,IMP IBA,IBA,IMP IBA,IBA,IBA,IMP,IGI ISS IBA,IMP IBA,IBA,IBA,IMP IBA,IBA,IBA,IMP,IBA,IMP,IBA,IBA,IBA,IBA,IMP,IBA,IBA,ISS,IBA,IBA,IMP,IMP,IBA,IDA IBA,IBA,IBA,IBA,IDA IGI IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IGI,IBA,IBA,IDA,IBA,IBA
## 2 HEP,IMP,ND,IMP IGI IBA,IBA,IMP,IBA,IDA IGI IBA,IDA IBA,IMP,IBA,IMP,IMP,IBA,IDA IMP IBA TAS,IBA,IBA,IMP,IBA,IMP,IMP IBA,IDA IMP IBA,IBA,IMP IGI IBA,IBA,IBA,IMP IGI,IMP IGI,IEP,IGI,IMP IBA,IBA,IBA,IMP,IMP IBA,IMP IBA,IBA,IDA ISS IBA,IMP HEP,IBA,IMP,IBA,IGI ISS,IBA,IGI IBA,IBA,IDA,IBA,IMP,IBA,IMP IBA,IBA,IBA,HEP,IMP IGI IEP IBA,IMP,IBA,IBA,IBA,IMP,IMP,IBA,IMP,IMP ISS IBA,IBA,IBA,IBA,IBA,ISS IBA,IMP IBA,IDA,IBA,HEP IBA,IMP,IBA,IBA,IDA IMP IGI IEP ISS IBA TAS,IBA,IMP IBA,IMP IBA,IMP,HEP,IMP IBA,IMP IGI,IDA IMP IBA,ISS IBA,IBA,IEP IBA,HEP,IBA,IBA,IDA,IBA,IBA,IDA IMP IBA,IMP,IBA,IEP IBA,HEP,IBA,IMP,IMP,IMP TAS,IMP,IBA,IGI,ND,IDA IMP IGI IBA,IBA,IMP IBA,HEP IBA,IGI,IMP IBA,ND,IDA IMP ISS IBA,IGI,HEP,IMP,IMP IGI,IBA,IBA,IMP,IBA,IMP IBA,IMP IBA,IBA,IMP IBA,IBA,IMP IBA,IBA,HEP,IBA,IMP ISS IBA NAS,IDA IBA,IDA IMP IGI IBA,IMP TAS,IMP IGI IBA,IMP IEP HEP IBA,IDA IMP IGI IEP,IBA,ISS IBA,ISS IBA,HEP,IBA,IBA,IBA,IGI,IMP,IMP IGI IBA,IMP,IMP,IGI,IMP IGI IBA,IBA,IMP IGI IBA,IMP,IBA,IBA,IBA,IBA,IBA,IMP,IMP,IBA,IBA,IBA,IMP IBA,IDA IMP IBA,IMP,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IMP IGI IBA,IMP IGI IBA,IDA,IBA,HEP IBA,IBA,IDA,IDA IMP IGI IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IMP IBA,HEP IBA,IBA,IDA,ND,IGI IBA,IMP IBA,IBA,IMP,IBA,IMP IGI,IMP IGI,IMP,IGI,IBA,IBA,IMP IBA,IMP IBA,IBA,IBA,IMP IBA,IMP,IBA,IBA,IBA,IMP,IBA TAS NAS,IBA,IMP IGI,IMP,IMP IGI IBA,IMP IGI IBA,IBA,IBA,IBA,IMP IBA,IBA,IMP,IMP IBA,IMP,IMP IGI IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IDA IMP IGI IBA,IMP ISS,IDA IBA,IDA,IBA,IGI IBA,IMP IGI IBA,IBA,IGI IBA,IMP IBA,IMP IGI IBA,IMP ISS,IMP IGI,IBA,ISS IBA,IBA,IMP IGI IBA,IBA,IBA,IBA,IBA,IMP,ISS IBA,IMP IGI IBA TAS,IBA,IBA,IBA,IBA,IBA,IGI IBA,IBA,IBA,IMP IGI IBA,IMP,IBA,IBA,IBA,IMP IBA,IMP IBA,ISS,IMP,IBA,IBA,IBA,ISS IBA,HEP,IMP IBA,IBA,IBA,IBA,IMP IBA,IBA,HEP,IBA,IMP,IBA,IBA,IBA,IBA,IMP IGI IBA,IMP,IMP IGI,IBA,IBA,IGI ISS IBA NAS,IMP IGI,IDA IMP IGI IBA,IBA,IBA,IMP IGI IBA,IMP,IBA,IBA,IBA,IBA,IDA IMP IGI IBA,IBA,IBA,IMP,IDA IMP IGI IBA,IMP IGI IBA,IMP IGI,IMP,IBA,IBA,IGI IEP IBA,IBA,IDA IMP IGI IBA,IBA,IBA,IBA,IBA,IBA,IBA,IGI IBA,IBA,IMP IBA,IBA,IGI,IBA,IDA IBA,IBA,IBA,IMP IBA,IMP IGI,IBA,IMP ISS,IDA,IBA,IBA,IBA,IBA,IBA,IDA,IBA,IBA,IBA,IBA,IBA,IDA,IMP HEP IBA,ISS IBA,IBA,ISS IBA,IBA,IBA,IBA,IBA,IMP,IMP,IGI ISS IBA NAS,IDA IMP ISS IBA,IBA,IBA,IBA,IMP IBA,IGI,IBA,IBA,IBA,IMP HEP,IBA,IMP IBA,IDA IMP IGI IBA,IDA,IBA,IBA,IMP,IMP IGI,IMP,IDA IMP IGI IBA,IMP NAS,IBA,IMP,IBA,IMP IBA,IMP IBA,IMP IBA,IBA,IBA,ISS,IMP,IBA,IBA,IBA,IBA,IBA,ISS IBA,IBA,IMP,IBA,IMP IBA,IBA,ISS IBA,IMP IBA,IMP,IBA,IDA IMP IGI,IMP,IGI IBA,IGI IBA,IMP IBA,IBA,IBA,IMP IBA,IDA IBA,IBA,IMP,IGI IBA,IBA,IBA,IDA IMP IGI IBA,IDA,IMP,IBA,ISS IBA,IMP,IMP,IBA,IBA,IDA IMP IGI IBA TAS,IBA,IBA,IEP,IMP ISS IBA,IBA,IBA,IBA,ISS,IBA,IBA,IBA,IBA,IMP IBA,IDA,IMP IGI IBA,IDA IMP IBA,IMP,IBA,IMP IEP IBA,IMP IGI ISS IBA NAS,HEP,IMP IGI NAS,ND,IBA,IC,IDA IMP IBA,IBA,IBA,IBA,IMP IGI IBA,IBA,IDA IPI IMP ISS,IMP IGI IBA,IBA,HEP IBA,IMP IBA,ISS,HEP,IMP IBA,IBA,IMP IGI IBA,IBA,IBA,IMP IGI TAS,IMP IBA,IBA,IBA,IBA,IMP IGI ISS,IBA,IMP IGI,IDA IMP IGI IBA,IDA IMP IGI IBA,IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IMP IBA,IMP IGI IBA,IMP,IBA,IBA,IBA,IMP,IMP IGI IBA,IBA,IBA,IMP IGI,HEP,ISS,IMP IGI IBA,ISS,IBA,IDA IBA,IBA,IDA IBA,IBA,IBA,IDA IMP IBA,IMP IGI IBA,IMP IGI IBA,IDA IMP,IDA,HEP IBA,IBA,NAS,IBA,IMP IGI IBA,IMP IBA,IBA,IDA IBA,IBA,IDA IMP IGI,IBA,IMP IBA,IBA,IGI ISS IBA,IMP HEP ISS IBA,IMP IGI IBA,IDA IMP IBA,IGI,IBA,IDA IBA,IMP,IMP IBA,IBA,IBA,IDA IBA,IMP IBA,IGI,IDA IMP ISS IBA,IMP IGI IBA,IMP IBA,IBA,IBA,IDA IMP IGI IBA,ISS IBA,IBA,IBA,IDA IMP IBA,IBA,IBA,IDA IBA,IBA,NAS,IMP IGI,IBA,IBA,IBA,IBA,IMP IGI,ND,IMP IBA,IDA IMP IBA,IMP IGI,IMP IBA,IBA,ISS IBA NAS,IMP,IMP IGI IBA,IMP,IBA,IDA IMP IGI IBA,ISS IBA,IBA,IBA,IBA NAS,IMP IBA,IMP ISS IBA,IMP IGI IBA,IMP,IMP IGI IBA,IMP IGI IBA,IMP IBA,IBA,IBA,IMP IGI IBA,IBA,IDA IMP IGI IBA,IDA IMP,IMP IBA IC,IBA,IMP IGI IBA,IBA,IBA,IMP IGI,IBA,IDA IMP IBA,IMP IGI IEP IBA,IBA,IMP ISS IBA,ISS IBA,IBA,IDA,IDA IMP IBA,IMP IBA,IBA,IBA,IMP,IDA IMP IGI ISS IBA TAS,IBA,IBA,IBA,IMP,IMP IGI IBA,IMP,IBA,IBA,IBA,IBA,IMP IBA,IDA IBA,IBA,IBA ND,IGI,IBA,IBA,IBA,IMP IBA,HEP IBA,IBA,IMP IGI,IMP,ISS,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IMP,IBA,IMP,IBA,IBA,IMP IBA,IMP,IBA,IBA,IBA,IDA IMP IGI IBA,IMP,IMP,IBA,ND,IMP IBA,IMP,IMP,IBA,IBA,IMP,IBA,IDA IMP IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IDA IBA,HEP,IBA,IBA,IBA,IMP IBA,IMP IGI IBA,IBA,IMP IGI,IBA,IBA,IMP IBA,IBA ND,IBA,IDA IMP IGI,IBA,IBA,IDA ISS IBA,IBA,IBA,IMP TAS,IBA,IMP,IMP IBA,IBA,IBA,IMP IGI IBA,IBA,IGI IBA,IBA,IBA,IBA,IMP IGI IBA,IMP IBA,IMP IBA,IBA,IBA,IBA,IMP,IMP,ISS IBA,IBA,IMP IGI,IMP IGI,ISS IBA,IMP IEP IBA,IMP,IBA,IGI HEP,IDA IBA,IMP,IBA,IBA,IBA,IDA IMP IGI ISS IBA,IBA,IBA,IDA IBA,IMP IGI,IBA,IBA,IBA,IMP IGI ISS IBA,IBA,IBA,IDA IMP IGI IBA,ISS IBA,IMP IBA,IEP IBA,IBA,IMP IGI IBA,IBA,IDA IBA TAS,IBA,IDA IMP IGI,IBA,IMP IGI,IDA IBA,IMP TAS,IDA IMP TAS,IMP IGI IBA,IBA,IBA,HEP IBA,IMP HEP IBA,IBA,IMP IBA,ND,IMP IBA,IGI,IMP IBA,IDA IMP IBA,IDA IMP IBA,HEP,IMP,IBA,IBA,IBA,IBA,IMP IGI ISS IBA,IBA,IMP,IMP IBA,IBA,IBA,IBA,IMP ISS IBA,IBA,IMP IBA,IBA,IDA IMP IGI TAS,IDA IMP IBA,IBA,IBA,IMP,IBA,IMP IGI IBA TAS,IMP IGI IBA,HEP,IBA,IMP IGI,IMP,IMP,IBA,IBA,IDA IMP IBA,IBA,IBA,IMP IGI ISS IBA,IBA,IMP IGI,IBA,IMP IGI,IDA,IMP IBA,IBA,IMP IGI IBA,IBA,IBA,IDA IMP IGI,ISS IBA,IBA,ISS,IMP IGI IBA,IMP,IMP TAS,IBA,IBA,IMP IBA,IMP IBA,IBA,IDA IMP IGI IBA,IDA IGI ISS IBA,IBA,IBA,IBA,IBA,IMP,IBA,IMP IGI IBA,IMP,IBA,IBA,IBA,IGI,IMP IBA,IBA,IBA,IMP IGI,IBA,IDA IBA,IBA,IBA,IBA,IBA,IMP,IDA IBA,IBA,IBA,IMP IBA,IBA,IMP IGI IBA,IBA,IMP IBA,IDA IMP IBA,ISS IBA,ISS IBA,IMP IGI,IBA,IDA IMP IGI IBA,IBA,IBA,IBA,IBA,IGI,IBA,IBA,IDA IMP IGI ISS,IMP IGI NAS,IBA,IBA,IMP,IBA,IDA IMP IGI,IDA IMP IGI,IMP IGI IBA,ISS IBA,IBA,IDA IMP IGI,IMP IBA,IBA,IBA,IMP,IMP IGI,IDA IMP IGI ISS IBA,IBA,HEP,IDA IBA,IMP IGI ISS IBA,IMP IGI IBA,IMP IBA,IMP IGI IBA,IMP IBA,IDA IMP,IBA,IBA,IMP IGI IBA,IGI,IMP IBA,IDA IMP IGI IBA,IBA,IBA,IDA IMP IGI,IMP,IMP,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IMP,IMP,IGI IBA,IMP,IMP IGI,IBA,IBA,IBA,IBA,HEP,IMP IBA,ND,IMP,IMP IGI,IMP IBA,IBA,IMP,IBA,IMP IBA,IBA,IDA IMP IGI IBA,IBA,IBA,ISS,IMP IBA,IBA,IBA,IBA,IMP,IDA IMP IBA,IMP,IMP,IMP,IMP ISS IBA,IBA,IBA,IBA,HEP IBA,IBA,IBA,IMP IGI NAS,IBA,IMP IBA,IBA,IMP IGI IBA,IBA,IMP IGI IBA,IGI IBA,ISS IBA,IMP IBA,IBA,IMP IGI IBA,IBA,IBA,IBA,IBA,IMP,IBA,IDA IGI IBA,IMP IGI IBA,IMP IBA,IBA,IDA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IMP IGI ISS IBA,IBA,IMP,IBA,IMP IBA,IMP IGI IBA,IBA,IBA,IMP,IBA,IBA,ISS,IBA,IBA,IBA,ISS IBA NAS,IMP IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IGI IBA,ISS IBA,IMP IGI,IBA,IBA,IBA,IMP,IMP IBA,IMP ISS,IBA,IBA,IBA,IBA,IMP IGI IBA,IDA IMP IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IGI IBA,IBA,IBA,IBA,IDA IMP IBA,IBA,IMP IGI,IBA,IBA,IBA,IMP IGI,IBA,IBA,IMP IBA,IMP,IBA,IGI,IBA,IBA,IBA,IMP ISS,IBA,IBA,IMP IBA,IBA,IBA,IMP,ISS IBA,IBA,IMP IGI,IBA,IBA,IBA,IDA,IBA,IMP IGI IBA,IDA,IMP IBA,IBA,IBA,IMP IGI IBA,IMP IBA,IBA,IBA,IMP IBA TAS,IBA,IBA,IDA IGI,IBA,IBA,ISS IBA,IDA ISS IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,HEP,IBA,IBA,IBA,IBA
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP,IBA,IBA,IBA,IMP,IDA IBA,IBA,IBA,IMP IGI,IBA,IBA,IBA,IMP,IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IMP IGI,IBA,IBA,IDA IBA,IMP IBA,IBA,IBA,IBA,IBA,IMP IGI IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IMP,IBA,IMP IBA,IBA,IMP IBA,IBA,IBA,IGI ISS IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IMP,IBA,IBA,IBA,IBA,IBA,IBA,ISS,IBA,IBA,IMP,IMP,IBA,IDA IBA,IBA,IBA,IDA IGI IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IGI,IBA,IBA,IDA,IBA,IBA
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     IMP,IMP,IBA,IDA IBA,IDA,IMP,IBA,IMP,IBA,IDA IMP IBA TAS,IBA,IBA,IDA IBA,IBA,IGI,IMP IBA,IBA,IBA,IMP,IBA,IBA,IBA,IDA IBA,IBA,IBA,ISS,IBA,IBA,IDA,IBA,IBA,IMP IBA,IBA,IBA,IMP IGI IBA,IBA,IBA,IBA,IMP,IMP ISS IBA,IBA,IBA,IBA,IBA,ISS,IMP IBA,IDA,IBA,IBA,IBA,IDA ISS IBA TAS,IBA,IBA,IBA,IBA,IDA IBA,ISS IBA,IBA,IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IBA,IBA,IGI,IDA IMP IBA,IBA,IBA,IBA,IGI,IMP IBA,ISS,IMP,IBA,IBA,IBA,IBA,IBA,IBA,IBA,ISS IBA NAS,IDA IBA,IDA IMP,IBA,ISS IBA,ISS IBA,IBA,IMP,IMP,IGI,IBA,IBA,IBA,IBA,IMP,IBA,IBA,IDA IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IGI,IDA,IBA,IBA,IDA,IDA IMP,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IDA,IGI IBA,IMP,IBA,IMP,IBA,IBA,IBA,IBA,IBA,IMP,IBA,IBA,IBA,IGI IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IDA,ISS,IBA,IBA,IMP IGI IBA,IBA,IBA,IMP IBA,IMP IGI IBA,IBA,ISS IBA,IMP IGI IBA,IBA,IBA,ISS IBA,IGI IBA TAS,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,ISS,IBA,IBA,IBA,ISS IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IGI IBA,IMP,IGI,IBA,IBA,ISS IBA NAS,IDA IMP IBA,IBA,IMP IGI IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IMP,IDA IBA,IMP IGI IBA,IBA,IBA,IBA,IMP,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IDA IBA,IBA,IBA,IMP IBA,IMP IGI,IMP ISS,IDA,IBA,IBA,IBA,IBA,IDA,IBA,ISS IBA,ISS IBA,IBA,IBA,IMP,IGI ISS IBA NAS,ISS IBA,IBA,IMP IBA,IBA,IBA,IMP IBA,IDA IMP IGI IBA,IDA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,ISS,IBA,IBA,IBA,IBA,IBA,ISS IBA,IBA,IBA,IMP IBA,IBA,ISS IBA,IBA,IBA,IDA,IGI IBA,IMP IBA,IBA,IBA,IGI IBA,IBA,IBA,IDA,ISS IBA,IMP,IBA,IBA,IMP IGI IBA TAS,IBA,IBA,ISS IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IDA,IBA,IDA IMP IBA,IMP,ISS IBA NAS,IBA,IC,IDA IMP IBA,IBA,IBA,IBA,IBA,IDA ISS,IMP IBA,IBA,IBA,IMP IBA,ISS,IBA,IBA,IMP IBA,IBA,IDA IMP IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IMP,IBA,IBA,IBA,IGI IBA,IBA,IMP IGI,ISS,IBA,ISS,IDA IBA,IBA,IDA IBA,IBA,IBA,IDA IBA,IMP,IBA,NAS,IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IMP,IBA,ISS IBA,ISS,IMP IBA,IDA IMP IBA,IDA IBA,IBA,IBA,IDA,IMP IBA,IDA ISS IBA,IBA,IMP IBA,IBA,IDA IBA,ISS IBA,IBA,IBA,IDA IMP IBA,IDA IBA,IBA,NAS,IBA,IBA,IBA,IBA,IMP,IBA,IDA IBA,IMP IBA,IBA,ISS IBA NAS,IMP,IMP,IDA IMP IBA,ISS IBA,IBA,IBA NAS,IBA,IBA,IMP,IMP IGI IBA,IBA,IMP IBA,IBA,IBA,IGI IBA,IBA,IDA,IBA,IMP IBA,IBA,IBA,IDA IMP IBA,IBA,ISS IBA,ISS IBA,IBA,IBA,IBA,IDA IMP ISS IBA,IBA,IBA,IMP,IMP IGI,IBA,IBA,IBA,IMP IBA,IDA IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IDA,IMP,IBA,IMP,IMP,IDA IMP IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IDA,IBA,IBA,IBA,IBA,IBA,IDA IGI,IBA,IBA,IDA ISS IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IMP IBA,IBA,IBA,IBA,IMP,IMP,ISS IBA,IBA,ISS IBA,IMP,IBA,IDA IBA,IBA,IBA,IDA IGI ISS IBA,IBA,IDA IBA,IMP IGI,IBA,IBA,IBA,IDA,ISS IBA,IMP IBA,IBA,IBA,IDA IBA,IBA,IBA,IBA,IMP IBA,IBA,IMP IBA,IMP IBA,IDA IMP,IDA IMP IBA,IMP,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,ISS IBA,IBA,IMP IBA,IBA,IDA IGI TAS,IDA IMP IBA,IBA,IBA,IBA,IMP IGI IBA,IBA,IMP IGI,IMP,IBA,IDA IMP IBA,IBA,IBA,IMP IGI ISS IBA,IDA,IMP IBA,IBA,IGI,IBA,IBA,ISS IBA,ISS,IBA,IBA,IBA,IBA,IDA IMP,IDA IGI ISS IBA,IBA,IBA,IBA,IBA,IMP,IBA,IMP IBA,IBA,IBA,IDA IBA,IBA,IBA,IBA,IBA,IMP,IBA,IMP IBA,IBA,IMP IGI IBA,IMP IBA,IDA IMP IBA,ISS IBA,ISS IBA,IBA,IDA IMP IGI IBA,IBA,IBA,IBA,IGI,IBA,IBA,IBA,IDA,IDA,IBA,ISS IBA,IBA,IDA IMP IGI,IBA,IBA,IDA IBA,IMP ISS IBA,IBA,IMP IBA,IMP,IBA,IDA IBA,IBA,IBA,IBA,IBA,IBA,IMP,IBA,IBA,IBA,IBA,IBA,IMP,IBA,IMP IBA,IDA IMP IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IDA IBA,IMP,IMP,IBA,IBA,IBA,NAS,IBA,IMP IBA,IBA,IMP IBA,IMP IBA,IBA,ISS IBA,IBA,IBA,IBA,IBA,IDA IGI IBA,IBA,IMP IBA,IBA,IDA,IBA,IBA,IMP IBA,IBA,IMP ISS IBA,IBA,IBA,IBA,IBA,IBA,ISS,IBA,IBA,ISS IBA NAS,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,ISS IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IMP IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP ISS,IBA,IMP IBA,IBA,IBA,IMP,ISS IBA,IBA,IMP IGI,IBA,IBA,IBA,IBA,IDA,IMP IBA,IMP IBA,IBA,IBA,IBA TAS,IBA,IBA,IDA,IBA,IDA,IBA,IBA,IBA,IBA,IBA,IBA,IBA
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP ISS IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,ISS IBA,IBA,IBA,IBA,IBA,IBA,ISS,IMP,IBA,IBA,IBA,ISS IBA NAS,IDA IBA,IBA,ISS IBA,ISS IBA,IBA,IMP,IGI,IBA,IBA,IMP,IBA,IBA,IBA,IMP IGI,IDA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP,IBA,IBA,IMP,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IMP IGI IBA,IMP IGI IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,ISS,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP,IGI,IBA,ISS IBA NAS,IBA,IBA,IBA,IBA,IBA,IMP,IMP IGI,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP,IMP IGI,ISS,IDA,IBA,IBA,IBA,IBA,ISS IBA,IBA,IGI ISS IBA NAS,IBA,IBA,IMP IBA,IBA,IMP IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IDA,IGI IBA,IBA,IBA,IGI IBA,IBA,ISS IBA,IBA,IMP IGI IBA TAS,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IDA,ISS IBA NAS,IBA,IDA IMP IBA,IBA,IBA,IBA,IMP IBA,ISS,IBA,IMP IBA,IBA,IDA IMP IBA,IBA,IBA,IDA IBA,IBA,IBA,IBA,IGI IBA,IMP IGI,IBA,ISS,IBA,IDA IBA,IMP,IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IMP IBA,IDA IBA,ISS IBA,IBA,IBA,IBA,IBA,IMP IBA,ISS IBA NAS,IMP,ISS IBA,IBA,IBA NAS,IMP,IMP IGI IBA,IBA,IBA,IGI IBA,IDA,IMP IBA,IBA,IDA IMP IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IMP,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IDA,IDA ISS IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,ISS IBA,IBA,IDA,IBA,IBA,IBA,IBA,ISS IBA,IMP IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IMP IBA,IDA IMP,IDA IMP IBA,IBA,IBA,IBA,IBA,IBA,IBA,IGI TAS,IDA IMP IBA,IBA,IBA,IBA,IBA,IMP IGI,IDA IMP IBA,IBA,IMP IGI,IBA,ISS IBA,ISS,IBA,IBA,IDA IGI ISS IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IBA,IMP,IMP IBA,IBA,IMP IBA,IMP,IMP IGI IBA,IGI,IBA,IBA,IDA,IDA,IBA,IDA IMP,IBA,IMP ISS IBA,IBA,IBA,IBA,IBA,IMP,IBA,IBA,IDA IMP IBA,IBA,IBA,IMP,IMP,IMP,IBA,IBA,NAS,IBA,IBA,IBA,IBA,IDA IGI IBA,IMP IBA,IMP ISS IBA,IBA,IBA,IBA,IBA,ISS IBA NAS,IBA,IBA,IBA,IBA,ISS IBA,IBA,IBA,IBA,IBA,IMP IBA,IMP,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP,IBA,IBA,IGI,IBA,IDA,IMP IBA,IBA,IBA,IBA,IDA,IBA,IBA,IBA
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 IMP,IBA,IDA IBA,IMP,IBA,IMP,IBA,IDA IBA TAS,IBA,IBA,IDA IBA,IBA,IGI,IBA,IBA,IBA,IMP,IBA,IBA,IBA,IDA IBA,IBA,IBA,ISS,IBA,IDA,IBA,IBA,IMP IBA,IBA,IBA,IMP IGI IBA,IBA,IBA,IMP,IMP ISS IBA,IBA,IBA,IBA,IBA,ISS,IMP IBA,IDA,IBA,IBA,IBA,IDA ISS IBA TAS,IBA,IBA,IBA,IBA,IDA IBA,ISS IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IDA IMP IBA,IBA,IBA,IMP IBA,ISS,IMP,IBA,IBA,IBA,IBA,IBA,IBA,ISS IBA NAS,IDA IBA,IDA,IBA,ISS IBA,ISS IBA,IBA,IMP,IMP,IGI,IBA,IBA,IBA,IMP,IBA,IBA,IDA IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IGI,IDA,IBA,IBA,IMP,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP,IBA,IMP,IBA,IBA,IBA,IBA,IMP,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IDA,ISS,IBA,IBA,IGI IBA,IBA,IBA,IMP IBA,IMP IGI IBA,IBA,IBA,IMP IGI IBA,IBA,ISS IBA,IGI TAS,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,ISS,IBA,IBA,IBA,ISS IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,IGI IBA,IMP,IGI,IBA,IBA,ISS IBA NAS,IDA IMP IBA,IBA,IMP IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IMP,IDA IBA,IMP IGI IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IMP IBA,IMP IGI,ISS,IDA,IBA,IBA,IBA,IBA,IDA,IBA,ISS IBA,ISS IBA,IBA,IMP,IGI ISS IBA NAS,ISS IBA,IBA,IMP IBA,IBA,IBA,IMP IBA,IDA IMP IGI IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,ISS,IBA,IBA,IBA,IBA,IBA,ISS IBA,IBA,IBA,IMP IBA,IBA,ISS IBA,IBA,IDA,IGI IBA,IMP IBA,IBA,IBA,IGI IBA,IBA,ISS IBA,IMP,IBA,IBA,IMP IGI IBA TAS,IBA,IBA,ISS IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IDA,IBA,IDA IMP IBA,ISS IBA NAS,IBA,IC,IDA IMP IBA,IBA,IBA,IBA,IBA,IDA ISS,IMP IBA,IMP IBA,ISS,IBA,IBA,IMP IBA,IBA,IDA IMP IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IMP,IBA,IBA,IBA,IGI IBA,IBA,IMP IGI,ISS,IBA,ISS,IDA IBA,IBA,IDA IBA,IBA,IBA,IDA IBA,IMP,IBA,NAS,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IDA IMP IBA,IDA IBA,IBA,IBA,IDA,IMP IBA,IDA ISS IBA,IBA,IMP IBA,IBA,IDA IBA,ISS IBA,IBA,IBA,IDA IBA,IBA,IBA,NAS,IBA,IBA,IBA,IBA,IMP,IBA,IDA,IMP IBA,IBA,ISS IBA NAS,IMP,IMP,IDA IMP IBA,ISS IBA,IBA,IBA NAS,IBA,IBA,IMP,IMP IGI IBA,IBA,IMP IBA,IBA,IBA,IGI IBA,IBA,IDA,IBA,IMP IBA,IBA,IBA,IDA IMP IBA,IBA,ISS IBA,ISS IBA,IBA,IBA,IBA,IDA IMP ISS IBA,IBA,IBA,IMP,IMP IGI,IBA,IBA,IBA,IMP IBA,IDA IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IDA,IMP,IBA,IMP,IMP,IDA IMP IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,IDA,IBA,IBA,IDA ISS IBA,IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IMP,IMP,ISS IBA,IBA,ISS IBA,IBA,IDA IBA,IBA,IBA,IDA IGI ISS IBA,IBA,IDA IBA,IMP IGI,IBA,IBA,IBA,IDA,ISS IBA,IMP IBA,IBA,IDA IBA,IBA,IBA,IBA,IMP IBA,IBA,IMP IBA,IMP IBA,IDA IMP,IDA IMP IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,ISS IBA,IBA,IMP IBA,IBA,IDA IGI TAS,IDA IMP IBA,IBA,IBA,IBA,IMP IGI IBA,IBA,IMP IGI,IMP,IBA,IDA IMP IBA,IBA,IBA,IMP IGI ISS IBA,IDA,IMP IBA,IBA,IGI,IBA,ISS IBA,ISS,IBA,IBA,IBA,IDA IMP,IDA IGI ISS IBA,IBA,IBA,IBA,IBA,IMP,IMP IBA,IBA,IBA,IDA IBA,IBA,IBA,IBA,IBA,IMP,IBA,IMP IBA,IBA,IMP IBA,IMP IBA,IDA IMP IBA,IBA,ISS IBA,IBA,IDA IMP IGI IBA,IBA,IBA,IBA,IGI,IBA,IBA,IBA,IDA,IDA,IBA,ISS IBA,IBA,IDA IMP,IBA,IBA,IBA,IMP ISS IBA,IBA,IMP IBA,IMP,IBA,IDA IBA,IBA,IBA,IBA,IBA,IMP,IBA,IBA,IBA,IBA,IBA,IMP,IBA,IMP IBA,IDA IMP IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IDA IBA,IMP,IMP,IBA,IBA,IBA,NAS,IBA,IMP IBA,IBA,IBA,IBA,ISS IBA,IBA,IBA,IBA,IBA,IDA IGI IBA,IMP IBA,IBA,IDA,IBA,IBA,IMP IBA,IBA,IMP ISS IBA,IBA,IBA,IBA,IBA,IBA,ISS,IBA,IBA,ISS IBA NAS,IBA,IBA,IBA,IBA,IBA,IBA,IBA,ISS IBA,IBA,IBA,IBA,IMP IBA,IBA,IBA,IBA,IMP IBA,IMP IBA,IBA,IBA,IBA,IBA,IBA,IDA IBA,IBA,IBA,IBA,IBA,IBA,IBA,IBA,IMP ISS,IMP IBA,IBA,IBA,IMP,ISS IBA,IBA,IMP IGI,IBA,IBA,IBA,IBA,IDA,IMP IBA,IMP IBA,IBA,IBA,IBA TAS,IBA,IBA,IDA,IBA,IDA,IBA,IBA,IBA,IBA,IBA,IBA,IBA
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              intersection
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         F57F5.1,aagr-1,cpr-5,K10C2.1,Y16B4A.2,cht-3,cpr-4,skr-8,cpl-1,ubc-7,skr-10,skr-7,skr-9,skr-13,asp-2,F41C3.5,asp-14,emb-27,ubc-12,exos-3,asp-8,swah-1,cht-1,cdk-1,hrg-7,cpz-2,cup-2,npl-4.1,ivd-1,csn-3,skr-15,K04C1.5,cyn-13,R02D3.8,F13D12.6,pbs-7,smg-5,F32A5.3,skr-2,pas-2,Y50D4B.4,asp-5,pas-5,bec-1,smg-4,rpn-12,sup-46,rpb-4,rbx-1,rpt-4,pbs-3,bath-41,nipi-3,rpm-1,C27F2.10,C16C8.4,ubc-18,smg-6,F32B6.3,rnh-2,skr-14,asp-1,F10E7.5,crn-3,cpr-1,F56D2.2,ubc-1,F10G7.10,rpn-8,cpr-6,lys-2,ubr-1,wwp-1,exos-7,C33G3.4,npl-4.2,lsm-1,pas-3,Y105E8A.2,B0393.3,rpb-7,zyg-11,ZC190.4,let-711,C04E12.4,ufd-2,ubl-1,ned-8,aagr-2,png-1,vps-28,pelo-1,vps-22,crn-5,edc-3,uggt-1,aagr-3,enpl-1,gsk-3,pas-4,rbx-2,exos-4.2,ari-1.4,sur-2,pbs-4,apc-17,rpt-1,sgt-1,panl-2,smg-8,kin-19,pbs-6,etc-1,exos-2,asp-4,spsb-2,crn-4,smg-7,rpn-5,F21D5.6,Y66D12A.9,mtr-4,vps-25,sel-11,F52C6.3,dlk-1,bath-42,Y71H2AM.3,wah-1,ari-1.3,T20D4.3
## 2 dod-19,sepa-1,ddo-3,cav-1,folt-2,skpo-1,F57F5.1,fat-7,msra-1,aagr-1,cpr-5,epg-2,dig-1,K10C2.1,gst-10,Y16B4A.2,dsl-2,myo-3,cht-3,vet-6,cogc-6,cpr-4,skr-8,lys-7,ilys-5,mrp-3,pat-2,nid-1,clec-52,acs-2,cpl-1,pfas-1,ubc-7,sod-2,skr-10,skr-7,F38B6.4,aco-1,dod-24,dhs-25,apl-1,W10C8.5,icl-1,skr-9,lys-4,C26B9.5,ges-1,zip-9,acs-1,T22D1.5,gna-2,nhr-2,ceh-49,ZK1055.7,lgg-1,mua-3,skr-13,asp-2,pgp-8,cle-1,mua-6,sucl-1,gas-1,cat-4,ldp-1,F41C3.5,F15E6.6,tps-2,cdc-25.3,tbh-1,pmt-2,ampd-1,asp-14,F56C9.7,ZK1321.4,F21D5.1,kin-29,W02B12.1,plc-3,emb-27,ddi-1,F08G2.5,rnr-2,klc-1,ubc-12,aptf-4,exos-3,lys-8,K08D8.5,gei-3,asp-8,snf-3,lig-1,swah-1,sqv-4,sdz-28,gyg-1,cht-1,F19C7.1,gpd-2,unc-85,unc-87,emo-1,inx-3,pcp-2,cyp-31A2,pes-10,cdk-1,sdha-1,iff-1,dsc-4,mut-14,rfc-4,clec-88,unc-37,htp-1,ZK6.11,rde-4,nmy-1,K08E4.2,hrg-7,let-805,cpz-2,cup-2,npl-4.1,fah-1,immt-1,tads-1,casy-1,psf-2,F54D5.4,mrp-2,tbx-9,tyms-1,tax-6,anc-1,baf-1,lys-1,spp-1,Y53C12B.1,dpyd-1,rrp-8,clec-186,C17H12.8,pfd-6,orc-4,lrp-2,ivd-1,egl-30,tdc-1,set-1,nyn-1,T16G12.1,memb-2,mdf-1,kbp-1,his-24,scrm-4,F35C8.5,mca-2,F44B9.8,F53C11.3,dhs-28,psmd-9,pgph-3,sucg-1,vha-6,csn-3,vha-11,skr-15,K04C1.5,snpc-1.1,F19F10.9,T14B4.3,tald-1,glit-1,lgg-3,cyn-13,damt-1,czw-1,R02D3.8,C01G10.7,app-1,ced-9,prp-31,F13D12.6,B0024.11,F54C8.4,pbs-7,snfc-5,trpp-5,msi-1,nmat-2,vha-20,gst-7,asb-2,hex-2,C02F12.5,hel-1,tni-1,col-96,asfl-1,acl-1,mab-31,sel-2,smg-5,sup-36,trcs-1,F43G6.5,dvc-1,cks-1,R06C1.4,R02F2.9,arx-1,pot-2,T22B2.1,ssup-72,F32A5.3,csc-1,cyn-9,atp-5,lrp-1,san-1,skr-2,gst-1,ZK550.3,pas-2,kxd-1,alp-1,E01A2.5,acdh-1,lam-2,toe-2,ndc-80,B0334.4,trm-1,Y50D4B.4,K05C4.5,prpf-4,pfk-1.1,asp-5,pas-5,K08D12.3,psf-1,aptf-2,suca-1,air-2,maoc-1,rpa-2,tns-1,pld-1,ver-3,bec-1,C27H6.9,rict-1,smg-4,slx-1,nst-1,ccdc-55,acl-7,cts-1,F39C12.1,polh-1,nasp-2,C01G6.5,pfkb-1.2,cdc-6,egg-3,pgk-1,puf-8,rpn-12,B0511.6,sld-5,slc-25A10,T05A12.4,lin-37,dgk-4,sup-46,cyb-2.2,lap-1,Y73C8C.8,pnn-1,T23G11.4,cids-1,emb-9,E01A2.2,inx-13,rpb-4,F55F10.1,erfa-1,ogdh-1,ZK1320.2,rbx-1,T13H5.4,C05D11.9,plrg-1,rpt-4,F52H2.6,dod-17,orc-5,set-4,ZK686.3,acl-3,C18B2.4,rpl-11.2,csn-5,syp-2,him-5,F29A7.6,idha-1,rsp-1,daf-25,sir-2.1,chaf-1,pri-1,isp-1,ptr-23,Y47D3A.14,F25H8.2,pbs-3,rnp-2,djr-1.1,hat-1,pgph-2,mfap-1,pam-1,smo-1,cls-1,lab-1,ZK524.4,got-2.2,pgp-1,rbm-3.2,sca-1,bath-41,T15H9.6,H03A11.2,T12A2.1,rpl-31,snr-1,pfn-2,F45H10.3,nipi-3,sars-2,hmg-3,ost-1,clpf-1,F46B6.4,Y54E2A.4,C37C3.2,rpm-1,C18B2.5,skp-1,tceb-3,E02D9.1,plpp-1.1,ZK1127.3,T01B7.5,tba-7,cdc-48.3,aps-3,gpd-3,mrp-5,tppp-1,hpo-9,spp-5,dct-17,lin-54,pgp-9,ggtb-1,T25G3.3,cec-8,thoc-5,C56C10.9,kca-1,glod-4,rsp-6,rpa-1,C27F2.10,zipt-13,rpl-11.1,isy-1,spe-5,taf-11.2,R04F11.5,abt-4,clec-41,C16C8.4,taf-9,ubc-18,hex-1,T26G10.1,hxk-1,tpp-2,sax-7,spd-1,zen-4,col-19,sdhd-1,spd-3,hpo-11,him-8,smg-6,lpin-1,cyn-10,slc-25A18.2,hphd-1,unc-14,F54D1.6,ZK973.1,F32B6.3,dagl-2,rnh-2,gly-13,dhfr-1,elli-1,C56G2.5,msh-2,skr-14,cdk-8,iff-2,pxn-2,ZK829.7,bas-1,knl-3,flh-3,pfn-3,coq-6,atln-1,F58G1.2,soap-1,ooc-5,mdt-20,vps-33.2,rtel-1,asp-1,F10E7.5,hmt-1,cah-4,rba-1,F07F6.4,W02B12.10,allo-1,plp-1,W04B5.5,K07C5.3,crn-3,ZK795.3,Y53G8AR.5,spp-2,nubp-1,emc-6,tgt-2,ZK632.4,ctf-8,nhr-114,F08F8.4,exo-1,Y38A10A.7,him-14,F40F8.1,unc-26,let-767,mlc-5,vha-19,ipla-2,rsp-3,F55G11.2,vha-12,ZK1320.7,Y53G8AR.9,nuo-5,pal-1,D1007.4,snu-23,Y47G6A.18,npp-5,cpr-1,ipla-1,mev-1,pcp-4,K12H4.7,cyc-1,ham-3,clec-62,cogc-3,T05G5.5,lon-2,his-41,F56D2.2,ida-1,trt-1,B0361.2,T10F2.2,ttc-17,lin-12,nkb-3,egg-2,eps-8,hif-1,mttu-1,Y73C8C.3,F49C12.11,Y51H4A.15,ubc-1,evl-18,tsen-2,aps-1,smp-2,phb-1,eef-1A.2,F10G7.10,snr-2,capg-2,mus-81,chp-1,rpn-8,elpc-1,K02D7.1,C16C10.2,gcy-28,cpsf-2,his-1,famk-1,pigv-1,aos-1,cyn-12,C27F2.4,clk-2,unc-80,lis-1,deb-1,frg-1,gst-24,rbc-1,nduo-5,pch-2,mec-15,bcat-1,B0511.7,mppa-1,cpr-6,abi-1,T24C4.5,emb-8,rpl-22,gst-20,lys-2,piki-1,mes-6,meg-2,K12H4.3,pps-1,spc-1,vab-10,cpsf-3,C28G1.4,cyn-5,cids-2,mex-6,cdk-7,raga-1,ubr-1,rpl-29,daf-41,clr-1,wdr-12,gst-39,leo-1,wwp-1,tag-273,trpp-4,phy-2,exos-7,nduo-6,vps-50,F10E9.5,C06G3.5,ucr-1,eif-2Balpha,fem-3,ZK177.1,cdk-11.2,gsto-1,unc-50,adr-2,C33G3.4,rsp-5,taf-10,dhc-1,rbbp-5,npp-22,bli-3,hpd-1,iars-1,nasp-1,his-3,lpd-8,eat-6,ppfr-4,mut-2,xpa-1,dyn-1,npl-4.2,cit-1.1,lsm-1,mys-1,pas-3,pat-3,bir-1,npp-20,dhhc-6,him-18,Y105E8A.2,F11A10.6,let-607,pars-1,sbp-1,unc-52,B0393.3,sir-2.4,ctf-4,F10G8.9,aqp-7,pmr-1,gad-1,scbp-2,rab-21,ptr-13,cdk-4,rpb-7,Y38F2AR.9,mdt-19,zyg-11,rab-10,gpx-5,acl-12,ZK643.2,his-18,asb-1,ragc-1,gly-2,ZC190.4,C47E8.4,max-1,mcm-5,art-1,tbx-11,let-711,srp-7,Y105E8B.9,lrk-1,act-5,myo-2,W01G7.4,aars-2,eme-1,efr-3,col-179,C04E12.4,gut-2,zipt-1,fzy-1,his-5,lpd-9,M03F8.3,hmgr-1,uvs-1,ubc-25,R07E5.7,snpc-1.2,let-19,zyg-1,ilcr-2,coq-4,snr-4,zip-8,snap-29,endu-2,nars-1,nol-16,dnj-21,phb-2,rpl-25.1,ufd-2,K07E3.4,acly-1,ZK1067.2,rpc-25,zmp-4,cku-80,trxr-2,C34H4.1,mrpl-20,nkb-1,F25G6.8,unc-23,clp-1,rpl-10,ftt-2,B0495.8,nhr-1,csq-1,ubl-1,zhit-1,glp-1,idh-2,row-1,tbp-1,his-28,his-50,him-3,clsp-1,gcna-1,epg-5,top-1,pir-1,npp-7,rad-54,cyb-2.1,let-721,Y48B6A.10,did-2,ned-8,trpp-3,rtcb-1,tag-124,xpd-1,dph-1,sod-3,aagr-2,moc-3,spcs-1,fln-1,mnr-1,dbr-1,pgp-4,sas-5,C50F4.16,apm-1,rpl-12,tac-1,rars-2,spcs-2,dlc-1,png-1,cyn-7,iars-2,rnf-113,gsp-1,abhd-3.2,clu-1,B0261.1,mec-12,spds-1,dsl-1,unc-22,cblc-1,fan-1,pgp-14,dtmk-1,unc-116,cdc-25.2,cuc-1,frm-1,arl-8,zmp-3,unc-44,asm-2,flr-1,hcp-3,glp-4,tftc-1,rbm-7,F18H3.4,fos-1,gtf-2H3,vps-28,nhr-5,him-10,hmg-4,ddx-23,tir-1,henn-1,F01D5.5,dpy-27,pelo-1,W09C5.1,F20D1.1,sesn-1,pqn-85,hgo-1,nep-2,unc-59,prdh-1,vps-22,aass-1,ptp-3,xdh-1,ruvb-1,hxk-2,cbp-1,crn-5,tba-4,taf-6.1,nra-4,edc-3,tra-1,uggt-1,clec-187,snr-3,bet-1,dif-1,aagr-3,enpl-1,col-178,sptf-3,Y39A1A.14,ppk-3,daf-31,npp-25,egg-1,vps-2,mel-11,msp-79,cox-4,puf-12,ksr-2,nxf-2,F58F12.1,mig-10,cwc-15,hasp-1,T24G10.2,abts-1,dnc-1,jph-1,eif-2Bgamma,B0336.13,cutc-1,ssl-1,mars-1,gsk-3,cdd-1,pas-4,F56B3.11,rbx-2,rei-1,gbf-1,exos-4.2,sel-9,ulp-4,scpl-4,lonp-2,cyn-1,mdl-1,csnk-1,mel-46,ran-4,nop-1,fkh-4,apn-1,Y57G11C.43,lsm-8,B0491.1,mtcu-1,atp-3,kcc-1,maph-1.1,ari-1.4,sur-2,F10B5.2,spg-7,T06E6.1,csb-1,ubc-9,imp-1,B0464.9,aspm-1,pbs-4,brc-1,apc-17,eif-2gamma,tag-30,ufc-1,smc-5,rpl-14,ZK688.9,cdf-1,apx-1,alkb-8,madf-8,dpy-17,F21D5.3,gld-3,cep-1,aph-1,rpt-1,fitm-2,lin-28,lam-1,pod-1,F45G2.9,ttll-12,mex-1,syd-2,inos-1,ugt-44,mppb-1,elpc-3,gsa-1,fzo-1,gcy-12,coq-8,laat-1,sgt-1,mfn-1,hgap-2,lin-61,snpn-1,akt-1,scm-1,thoc-3,gsnl-1,adbp-1,lin-52,skat-1,C43E11.9,mans-2,panl-2,F35G2.1,rps-8,syf-1,unc-18,smg-8,pak-2,erm-1,sos-1,suox-1,dscc-1,tbx-36,fum-1,C18E9.9,rps-15,scl-5,kbp-2,mtm-6,snx-6,vps-29,cup-14,mbk-1,kin-19,ran-5,mog-2,C16C10.8,T13F2.2,bag-1,eif-2beta,pbs-6,pkn-1,aco-2,cgef-1,etc-1,apg-1,syp-1,ram-2,wapl-1,ippk-1,T04F8.6,wht-2,cld-9,sucl-2,C16A3.6,lin-53,exos-2,atg-5,wars-1,egl-21,arp-6,asp-4,mxl-1,gfm-1,abtm-1,got-2.1,npp-1,T01G5.1,T26A8.4,pgp-10,cdh-1,sqv-1,spsb-2,crn-4,sma-5,smg-7,rpn-5,efk-1,T08A11.1,rom-5,C35A5.8,cyp-33C9,C25G4.2,amt-4,nuo-1,Y57G11C.3,smu-1,dyci-1,gon-1,Y39A1A.22,blos-1,age-1,F21D5.6,gfat-2,ctn-1,Y66D12A.9,mtr-4,Y71F9B.2,asg-2,T24B8.7,cox-17,rsp-4,aps-2,F45G2.10,col-76,prx-10,eral-1,F08B4.7,rpap-2,Y75B8A.31,snt-4,vps-25,sel-11,gars-1,sma-1,peli-1,phy-3,D2013.6,gpb-1,epg-3,him-1,pitr-1,tgt-1,riok-3,mys-2,smu-2,lev-11,Y53F4B.21,C37C3.18,F25H8.1,BE10.3,Y69A2AR.18,bud-31,spat-2,vglu-2,F53E10.6,zhp-3,syx-7,ZK430.5,thoc-2,hpk-1,K07A12.1,fzr-1,nhx-1,pole-2,eef-1B.2,nhl-2,dpy-14,sel-5,ran-1,txdc-9,jmjd-5,cyk-1,pyk-1,F31D4.2,F52C6.3,hda-4,C04B4.2,trpp-10,cand-1,phf-30,dph-3,tofu-6,mrpl-51,B0491.7,dlk-1,natb-1,bath-42,srp-6,pept-3,amt-1,rad-51,agt-1,drsh-1,mlp-1,vps-11,vps-18,kars-1,M18.6,gdh-1,mel-32,Y71H2AM.3,K11H12.1,wah-1,F37A4.2,his-64,lmtr-2,cyn-3,ech-6,lam-3,ari-1.3,nac-1,ZC434.4,syx-16,F39G3.5,T20D4.3,R04F11.3,set-10,C15C6.4
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              F57F5.1,cpr-5,K10C2.1,Y16B4A.2,cpr-4,skr-8,cpl-1,ubc-7,skr-10,skr-7,skr-9,skr-13,F41C3.5,emb-27,ubc-12,exos-3,cdk-1,cpz-2,cup-2,npl-4.1,ivd-1,csn-3,skr-15,K04C1.5,cyn-13,R02D3.8,F13D12.6,pbs-7,smg-5,F32A5.3,skr-2,pas-2,Y50D4B.4,pas-5,smg-4,rpn-12,sup-46,rpb-4,rbx-1,rpt-4,pbs-3,bath-41,nipi-3,rpm-1,C27F2.10,C16C8.4,ubc-18,smg-6,F32B6.3,rnh-2,skr-14,F10E7.5,crn-3,cpr-1,F56D2.2,ubc-1,F10G7.10,rpn-8,cpr-6,ubr-1,wwp-1,exos-7,C33G3.4,npl-4.2,lsm-1,pas-3,Y105E8A.2,B0393.3,rpb-7,zyg-11,ZC190.4,let-711,C04E12.4,ufd-2,ubl-1,ned-8,png-1,vps-28,pelo-1,vps-22,crn-5,edc-3,uggt-1,enpl-1,gsk-3,pas-4,rbx-2,exos-4.2,ari-1.4,pbs-4,apc-17,rpt-1,sgt-1,panl-2,smg-8,kin-19,pbs-6,etc-1,exos-2,spsb-2,crn-4,smg-7,rpn-5,F21D5.6,Y66D12A.9,mtr-4,vps-25,sel-11,F52C6.3,dlk-1,bath-42,Y71H2AM.3,wah-1,ari-1.3,T20D4.3
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            sepa-1,skpo-1,F57F5.1,fat-7,msra-1,aagr-1,cpr-5,epg-2,K10C2.1,gst-10,Y16B4A.2,cht-3,cpr-4,skr-8,acs-2,cpl-1,pfas-1,ubc-7,sod-2,skr-10,skr-7,F38B6.4,aco-1,dhs-25,W10C8.5,icl-1,skr-9,C26B9.5,ges-1,zip-9,T22D1.5,gna-2,nhr-2,ceh-49,lgg-1,skr-13,asp-2,sucl-1,gas-1,cat-4,ldp-1,F41C3.5,F15E6.6,tps-2,cdc-25.3,tbh-1,pmt-2,ampd-1,asp-14,F21D5.1,kin-29,W02B12.1,plc-3,emb-27,rnr-2,ubc-12,aptf-4,exos-3,gei-3,asp-8,lig-1,swah-1,sqv-4,gyg-1,cht-1,gpd-2,pcp-2,cyp-31A2,cdk-1,sdha-1,iff-1,dsc-4,mut-14,rfc-4,unc-37,rde-4,hrg-7,cpz-2,cup-2,npl-4.1,fah-1,tads-1,psf-2,tbx-9,tyms-1,tax-6,Y53C12B.1,dpyd-1,rrp-8,orc-4,ivd-1,tdc-1,nyn-1,T16G12.1,his-24,F35C8.5,F44B9.8,dhs-28,pgph-3,sucg-1,csn-3,skr-15,K04C1.5,snpc-1.1,F19F10.9,tald-1,lgg-3,cyn-13,damt-1,R02D3.8,C01G10.7,app-1,ced-9,prp-31,F13D12.6,B0024.11,F54C8.4,pbs-7,snfc-5,msi-1,nmat-2,gst-7,asb-2,hex-2,hel-1,asfl-1,acl-1,smg-5,trcs-1,F43G6.5,cks-1,R06C1.4,R02F2.9,pot-2,ssup-72,F32A5.3,atp-5,skr-2,gst-1,ZK550.3,pas-2,E01A2.5,trm-1,Y50D4B.4,K05C4.5,prpf-4,pfk-1.1,asp-5,pas-5,K08D12.3,psf-1,aptf-2,suca-1,air-2,maoc-1,rpa-2,ver-3,bec-1,C27H6.9,rict-1,smg-4,slx-1,acl-7,cts-1,polh-1,C01G6.5,pfkb-1.2,pgk-1,puf-8,rpn-12,B0511.6,sld-5,T05A12.4,lin-37,dgk-4,sup-46,cyb-2.2,T23G11.4,cids-1,E01A2.2,rpb-4,F55F10.1,erfa-1,ogdh-1,rbx-1,T13H5.4,C05D11.9,plrg-1,rpt-4,F52H2.6,orc-5,ZK686.3,acl-3,C18B2.4,csn-5,syp-2,him-5,F29A7.6,idha-1,rsp-1,sir-2.1,pri-1,isp-1,Y47D3A.14,pbs-3,rnp-2,djr-1.1,hat-1,pgph-2,mfap-1,pam-1,smo-1,ZK524.4,got-2.2,rbm-3.2,sca-1,bath-41,T15H9.6,T12A2.1,rpl-31,snr-1,F45H10.3,nipi-3,sars-2,clpf-1,F46B6.4,Y54E2A.4,C37C3.2,rpm-1,skp-1,tceb-3,E02D9.1,plpp-1.1,ZK1127.3,T01B7.5,cdc-48.3,gpd-3,lin-54,ggtb-1,cec-8,thoc-5,glod-4,rsp-6,rpa-1,C27F2.10,isy-1,taf-11.2,C16C8.4,taf-9,ubc-18,hex-1,T26G10.1,hxk-1,sdhd-1,hpo-11,smg-6,lpin-1,cyn-10,slc-25A18.2,hphd-1,F54D1.6,ZK973.1,F32B6.3,dagl-2,rnh-2,gly-13,dhfr-1,C56G2.5,msh-2,skr-14,cdk-8,iff-2,ZK829.7,bas-1,flh-3,coq-6,F58G1.2,mdt-20,rtel-1,asp-1,F10E7.5,cah-4,W02B12.10,allo-1,W04B5.5,K07C5.3,crn-3,ZK795.3,Y53G8AR.5,nubp-1,emc-6,tgt-2,ZK632.4,nhr-114,F08F8.4,exo-1,Y38A10A.7,him-14,F40F8.1,unc-26,let-767,ipla-2,rsp-3,Y53G8AR.9,nuo-5,pal-1,D1007.4,snu-23,npp-5,cpr-1,ipla-1,mev-1,pcp-4,K12H4.7,cyc-1,ham-3,T05G5.5,F56D2.2,trt-1,B0361.2,hif-1,mttu-1,F49C12.11,Y51H4A.15,ubc-1,evl-18,tsen-2,phb-1,eef-1A.2,F10G7.10,snr-2,mus-81,rpn-8,elpc-1,C16C10.2,gcy-28,cpsf-2,famk-1,pigv-1,aos-1,cyn-12,C27F2.4,clk-2,lis-1,gst-24,nduo-5,pch-2,mec-15,bcat-1,B0511.7,mppa-1,cpr-6,T24C4.5,emb-8,rpl-22,gst-20,lys-2,piki-1,mes-6,pps-1,cpsf-3,C28G1.4,cyn-5,cids-2,cdk-7,raga-1,ubr-1,rpl-29,clr-1,wdr-12,gst-39,leo-1,wwp-1,phy-2,exos-7,nduo-6,F10E9.5,C06G3.5,ucr-1,eif-2Balpha,fem-3,cdk-11.2,gsto-1,adr-2,C33G3.4,rsp-5,taf-10,rbbp-5,bli-3,hpd-1,iars-1,his-3,lpd-8,ppfr-4,mut-2,xpa-1,dyn-1,npl-4.2,cit-1.1,lsm-1,mys-1,pas-3,bir-1,dhhc-6,him-18,Y105E8A.2,pars-1,sbp-1,B0393.3,sir-2.4,ctf-4,F10G8.9,gad-1,scbp-2,cdk-4,rpb-7,mdt-19,zyg-11,rab-10,acl-12,ZK643.2,asb-1,ragc-1,gly-2,ZC190.4,mcm-5,art-1,tbx-11,let-711,srp-7,Y105E8B.9,aars-2,eme-1,C04E12.4,gut-2,M03F8.3,hmgr-1,uvs-1,snpc-1.2,let-19,zyg-1,coq-4,snr-4,nars-1,phb-2,ufd-2,K07E3.4,acly-1,ZK1067.2,rpc-25,zmp-4,cku-80,trxr-2,clp-1,B0495.8,nhr-1,ubl-1,zhit-1,glp-1,idh-2,row-1,tbp-1,clsp-1,epg-5,top-1,pir-1,rad-54,cyb-2.1,let-721,ned-8,rtcb-1,tag-124,xpd-1,dph-1,sod-3,aagr-2,moc-3,spcs-1,dbr-1,pgp-4,C50F4.16,rpl-12,rars-2,spcs-2,png-1,iars-2,rnf-113,gsp-1,abhd-3.2,B0261.1,spds-1,unc-22,cblc-1,fan-1,dtmk-1,zmp-3,asm-2,glp-4,tftc-1,rbm-7,fos-1,gtf-2H3,vps-28,ddx-23,tir-1,henn-1,dpy-27,pelo-1,W09C5.1,sesn-1,pqn-85,hgo-1,prdh-1,vps-22,aass-1,ptp-3,xdh-1,ruvb-1,hxk-2,cbp-1,crn-5,taf-6.1,edc-3,tra-1,uggt-1,snr-3,bet-1,aagr-3,enpl-1,sptf-3,Y39A1A.14,ppk-3,daf-31,msp-79,cox-4,puf-12,ksr-2,nxf-2,F58F12.1,cwc-15,T24G10.2,eif-2Bgamma,B0336.13,ssl-1,mars-1,gsk-3,cdd-1,pas-4,rbx-2,rei-1,exos-4.2,ulp-4,lonp-2,csnk-1,mel-46,fkh-4,apn-1,Y57G11C.43,lsm-8,B0491.1,mtcu-1,atp-3,ari-1.4,sur-2,F10B5.2,spg-7,csb-1,ubc-9,imp-1,B0464.9,pbs-4,brc-1,apc-17,eif-2gamma,ufc-1,smc-5,ZK688.9,alkb-8,madf-8,gld-3,cep-1,aph-1,rpt-1,fitm-2,lin-28,F45G2.9,inos-1,mppb-1,elpc-3,gcy-12,coq-8,laat-1,sgt-1,akt-1,thoc-3,mans-2,panl-2,rps-8,syf-1,smg-8,pak-2,suox-1,dscc-1,tbx-36,fum-1,mtm-6,mbk-1,kin-19,mog-2,C16C10.8,T13F2.2,eif-2beta,pbs-6,pkn-1,aco-2,etc-1,syp-1,wapl-1,ippk-1,sucl-2,C16A3.6,lin-53,exos-2,atg-5,wars-1,egl-21,asp-4,mxl-1,gfm-1,got-2.1,T01G5.1,T26A8.4,spsb-2,crn-4,sma-5,smg-7,rpn-5,efk-1,T08A11.1,cyp-33C9,nuo-1,Y57G11C.3,smu-1,age-1,F21D5.6,gfat-2,Y66D12A.9,mtr-4,Y71F9B.2,asg-2,T24B8.7,rsp-4,F45G2.10,prx-10,F08B4.7,rpap-2,Y75B8A.31,vps-25,sel-11,gars-1,peli-1,phy-3,D2013.6,epg-3,tgt-1,riok-3,mys-2,smu-2,C37C3.18,F25H8.1,BE10.3,Y69A2AR.18,bud-31,F53E10.6,zhp-3,thoc-2,hpk-1,K07A12.1,pole-2,eef-1B.2,sel-5,jmjd-5,pyk-1,F52C6.3,hda-4,C04B4.2,cand-1,phf-30,dph-3,tofu-6,mrpl-51,B0491.7,dlk-1,natb-1,bath-42,srp-6,rad-51,agt-1,drsh-1,kars-1,M18.6,gdh-1,mel-32,Y71H2AM.3,K11H12.1,wah-1,F37A4.2,cyn-3,ech-6,ari-1.3,ZC434.4,T20D4.3,R04F11.3,set-10,C15C6.4
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        pfas-1,F38B6.4,zip-9,gna-2,nhr-2,ceh-49,sucl-1,cat-4,F15E6.6,tbh-1,ampd-1,F21D5.1,rnr-2,ubc-12,aptf-4,exos-3,gei-3,lig-1,gpd-2,rfc-4,unc-37,rde-4,fah-1,tads-1,psf-2,tbx-9,tyms-1,Y53C12B.1,dpyd-1,rrp-8,orc-4,tdc-1,nyn-1,his-24,F44B9.8,dhs-28,sucg-1,snpc-1.1,F19F10.9,cyn-13,damt-1,R02D3.8,prp-31,B0024.11,F54C8.4,snfc-5,msi-1,nmat-2,asb-2,hel-1,smg-5,F43G6.5,R06C1.4,pot-2,ssup-72,atp-5,trm-1,K05C4.5,pfk-1.1,psf-1,aptf-2,suca-1,rpa-2,smg-4,slx-1,polh-1,pgk-1,B0511.6,sld-5,lin-37,sup-46,T23G11.4,cids-1,E01A2.2,rpb-4,F55F10.1,T13H5.4,C05D11.9,plrg-1,rpt-4,orc-5,C18B2.4,syp-2,him-5,F29A7.6,rsp-1,sir-2.1,pri-1,Y47D3A.14,rnp-2,djr-1.1,mfap-1,smo-1,rbm-3.2,T15H9.6,T12A2.1,snr-1,sars-2,clpf-1,F46B6.4,Y54E2A.4,C37C3.2,rpm-1,skp-1,tceb-3,E02D9.1,ZK1127.3,T01B7.5,gpd-3,lin-54,thoc-5,rsp-6,rpa-1,C27F2.10,isy-1,taf-11.2,taf-9,T26G10.1,hxk-1,smg-6,lpin-1,F54D1.6,F32B6.3,rnh-2,dhfr-1,msh-2,bas-1,flh-3,F58G1.2,mdt-20,rtel-1,F10E7.5,W02B12.10,K07C5.3,crn-3,ZK795.3,Y53G8AR.5,tgt-2,ZK632.4,nhr-114,exo-1,Y38A10A.7,him-14,F40F8.1,rsp-3,Y53G8AR.9,pal-1,D1007.4,snu-23,npp-5,cyc-1,ham-3,T05G5.5,trt-1,B0361.2,hif-1,mttu-1,Y51H4A.15,ubc-1,evl-18,tsen-2,snr-2,mus-81,elpc-1,gcy-28,cpsf-2,C27F2.4,clk-2,lis-1,pch-2,mec-15,T24C4.5,mes-6,pps-1,cpsf-3,C28G1.4,cids-2,cdk-7,wdr-12,leo-1,exos-7,F10E9.5,C06G3.5,adr-2,rsp-5,taf-10,hpd-1,iars-1,his-3,mut-2,xpa-1,cit-1.1,lsm-1,mys-1,bir-1,him-18,pars-1,sbp-1,sir-2.4,ctf-4,gad-1,rpb-7,mdt-19,ZK643.2,asb-1,ZC190.4,mcm-5,tbx-11,let-711,aars-2,eme-1,gut-2,M03F8.3,uvs-1,snpc-1.2,let-19,snr-4,nars-1,K07E3.4,acly-1,ZK1067.2,rpc-25,cku-80,B0495.8,nhr-1,glp-1,tbp-1,top-1,pir-1,rad-54,rtcb-1,tag-124,xpd-1,moc-3,dbr-1,C50F4.16,rpl-12,rars-2,iars-2,rnf-113,B0261.1,cblc-1,fan-1,dtmk-1,glp-4,tftc-1,rbm-7,fos-1,gtf-2H3,ddx-23,tir-1,henn-1,pelo-1,W09C5.1,pqn-85,xdh-1,ruvb-1,hxk-2,cbp-1,crn-5,taf-6.1,edc-3,tra-1,snr-3,bet-1,sptf-3,Y39A1A.14,daf-31,F58F12.1,cwc-15,T24G10.2,B0336.13,mars-1,cdd-1,exos-4.2,mel-46,fkh-4,apn-1,Y57G11C.43,lsm-8,mtcu-1,atp-3,sur-2,F10B5.2,csb-1,ubc-9,brc-1,smc-5,alkb-8,madf-8,gld-3,cep-1,rpt-1,lin-28,F45G2.9,elpc-3,gcy-12,panl-2,rps-8,syf-1,smg-8,tbx-36,mbk-1,mog-2,C16C10.8,T13F2.2,eif-2beta,syp-1,wapl-1,sucl-2,C16A3.6,lin-53,exos-2,wars-1,mxl-1,T26A8.4,crn-4,smg-7,smu-1,F21D5.6,gfat-2,mtr-4,asg-2,rsp-4,prx-10,F08B4.7,rpap-2,Y75B8A.31,gars-1,D2013.6,tgt-1,riok-3,mys-2,smu-2,C37C3.18,F25H8.1,Y69A2AR.18,bud-31,F53E10.6,K07A12.1,pole-2,pyk-1,hda-4,phf-30,dph-3,dlk-1,rad-51,agt-1,drsh-1,kars-1,mel-32,K11H12.1,wah-1,ZC434.4,R04F11.3,C15C6.4
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                sepa-1,F57F5.1,fat-7,aagr-1,cpr-5,epg-2,K10C2.1,gst-10,Y16B4A.2,cht-3,cpr-4,skr-8,acs-2,cpl-1,pfas-1,ubc-7,sod-2,skr-10,skr-7,F38B6.4,aco-1,dhs-25,W10C8.5,icl-1,skr-9,ges-1,zip-9,T22D1.5,gna-2,nhr-2,ceh-49,lgg-1,skr-13,sucl-1,gas-1,cat-4,ldp-1,F41C3.5,F15E6.6,tps-2,cdc-25.3,tbh-1,pmt-2,ampd-1,asp-14,F21D5.1,kin-29,W02B12.1,plc-3,emb-27,rnr-2,ubc-12,aptf-4,exos-3,gei-3,lig-1,swah-1,gyg-1,cht-1,gpd-2,cdk-1,sdha-1,iff-1,rfc-4,unc-37,rde-4,cpz-2,cup-2,npl-4.1,fah-1,tads-1,psf-2,tbx-9,tyms-1,tax-6,Y53C12B.1,dpyd-1,rrp-8,orc-4,ivd-1,tdc-1,nyn-1,T16G12.1,his-24,F44B9.8,dhs-28,pgph-3,sucg-1,csn-3,skr-15,K04C1.5,snpc-1.1,F19F10.9,tald-1,lgg-3,cyn-13,damt-1,R02D3.8,C01G10.7,ced-9,prp-31,F13D12.6,B0024.11,F54C8.4,pbs-7,snfc-5,msi-1,nmat-2,gst-7,asb-2,hel-1,asfl-1,acl-1,smg-5,F43G6.5,cks-1,R06C1.4,R02F2.9,pot-2,ssup-72,F32A5.3,atp-5,skr-2,gst-1,ZK550.3,pas-2,E01A2.5,trm-1,Y50D4B.4,K05C4.5,prpf-4,pfk-1.1,pas-5,K08D12.3,psf-1,aptf-2,suca-1,air-2,maoc-1,rpa-2,ver-3,bec-1,C27H6.9,rict-1,smg-4,slx-1,acl-7,cts-1,polh-1,pfkb-1.2,pgk-1,puf-8,rpn-12,B0511.6,sld-5,T05A12.4,lin-37,dgk-4,sup-46,cyb-2.2,T23G11.4,cids-1,E01A2.2,rpb-4,F55F10.1,erfa-1,ogdh-1,rbx-1,T13H5.4,C05D11.9,plrg-1,rpt-4,orc-5,ZK686.3,acl-3,C18B2.4,csn-5,syp-2,him-5,F29A7.6,idha-1,rsp-1,sir-2.1,pri-1,isp-1,Y47D3A.14,pbs-3,rnp-2,djr-1.1,hat-1,pgph-2,mfap-1,pam-1,smo-1,ZK524.4,got-2.2,rbm-3.2,bath-41,T15H9.6,T12A2.1,rpl-31,snr-1,F45H10.3,nipi-3,sars-2,clpf-1,F46B6.4,Y54E2A.4,C37C3.2,rpm-1,skp-1,tceb-3,E02D9.1,plpp-1.1,ZK1127.3,T01B7.5,cdc-48.3,gpd-3,lin-54,ggtb-1,thoc-5,glod-4,rsp-6,rpa-1,C27F2.10,isy-1,taf-11.2,C16C8.4,taf-9,ubc-18,T26G10.1,hxk-1,sdhd-1,hpo-11,smg-6,lpin-1,cyn-10,slc-25A18.2,hphd-1,F54D1.6,ZK973.1,F32B6.3,dagl-2,rnh-2,gly-13,dhfr-1,C56G2.5,msh-2,skr-14,cdk-8,iff-2,bas-1,flh-3,coq-6,F58G1.2,mdt-20,rtel-1,F10E7.5,W02B12.10,allo-1,W04B5.5,K07C5.3,crn-3,ZK795.3,Y53G8AR.5,nubp-1,emc-6,tgt-2,ZK632.4,nhr-114,F08F8.4,exo-1,Y38A10A.7,him-14,F40F8.1,unc-26,let-767,rsp-3,Y53G8AR.9,nuo-5,pal-1,D1007.4,snu-23,npp-5,cpr-1,ipla-1,mev-1,cyc-1,ham-3,T05G5.5,F56D2.2,trt-1,B0361.2,hif-1,mttu-1,F49C12.11,Y51H4A.15,ubc-1,evl-18,tsen-2,phb-1,eef-1A.2,F10G7.10,snr-2,mus-81,rpn-8,elpc-1,C16C10.2,gcy-28,cpsf-2,famk-1,pigv-1,aos-1,cyn-12,C27F2.4,clk-2,lis-1,gst-24,nduo-5,pch-2,mec-15,bcat-1,B0511.7,mppa-1,cpr-6,T24C4.5,rpl-22,gst-20,piki-1,mes-6,pps-1,cpsf-3,C28G1.4,cyn-5,cids-2,cdk-7,raga-1,ubr-1,rpl-29,clr-1,wdr-12,gst-39,leo-1,wwp-1,phy-2,exos-7,nduo-6,F10E9.5,C06G3.5,ucr-1,eif-2Balpha,fem-3,cdk-11.2,gsto-1,adr-2,C33G3.4,rsp-5,taf-10,rbbp-5,bli-3,hpd-1,iars-1,his-3,lpd-8,ppfr-4,mut-2,xpa-1,dyn-1,npl-4.2,cit-1.1,lsm-1,mys-1,pas-3,bir-1,dhhc-6,him-18,Y105E8A.2,pars-1,sbp-1,B0393.3,sir-2.4,ctf-4,F10G8.9,gad-1,scbp-2,cdk-4,rpb-7,mdt-19,zyg-11,rab-10,acl-12,ZK643.2,asb-1,ragc-1,gly-2,ZC190.4,mcm-5,art-1,tbx-11,let-711,srp-7,Y105E8B.9,aars-2,eme-1,C04E12.4,gut-2,M03F8.3,hmgr-1,uvs-1,snpc-1.2,let-19,zyg-1,coq-4,snr-4,nars-1,phb-2,ufd-2,K07E3.4,acly-1,ZK1067.2,rpc-25,cku-80,B0495.8,nhr-1,ubl-1,zhit-1,glp-1,idh-2,row-1,tbp-1,clsp-1,epg-5,top-1,pir-1,rad-54,cyb-2.1,let-721,ned-8,rtcb-1,tag-124,xpd-1,dph-1,sod-3,aagr-2,moc-3,spcs-1,dbr-1,C50F4.16,rpl-12,rars-2,spcs-2,png-1,iars-2,rnf-113,gsp-1,abhd-3.2,B0261.1,spds-1,unc-22,cblc-1,fan-1,dtmk-1,asm-2,glp-4,tftc-1,rbm-7,fos-1,gtf-2H3,vps-28,ddx-23,tir-1,henn-1,pelo-1,W09C5.1,sesn-1,pqn-85,hgo-1,prdh-1,vps-22,aass-1,ptp-3,xdh-1,ruvb-1,hxk-2,cbp-1,crn-5,taf-6.1,edc-3,tra-1,uggt-1,snr-3,bet-1,aagr-3,enpl-1,sptf-3,Y39A1A.14,ppk-3,daf-31,msp-79,cox-4,puf-12,ksr-2,F58F12.1,cwc-15,T24G10.2,eif-2Bgamma,B0336.13,mars-1,gsk-3,cdd-1,pas-4,rbx-2,rei-1,exos-4.2,ulp-4,csnk-1,mel-46,fkh-4,apn-1,Y57G11C.43,lsm-8,B0491.1,mtcu-1,atp-3,ari-1.4,sur-2,F10B5.2,spg-7,csb-1,ubc-9,imp-1,B0464.9,pbs-4,brc-1,apc-17,eif-2gamma,ufc-1,smc-5,ZK688.9,alkb-8,madf-8,gld-3,cep-1,aph-1,rpt-1,fitm-2,lin-28,F45G2.9,inos-1,mppb-1,elpc-3,gcy-12,coq-8,laat-1,sgt-1,akt-1,mans-2,panl-2,rps-8,syf-1,smg-8,pak-2,suox-1,dscc-1,tbx-36,fum-1,mtm-6,mbk-1,kin-19,mog-2,C16C10.8,T13F2.2,eif-2beta,pbs-6,pkn-1,aco-2,etc-1,syp-1,wapl-1,ippk-1,sucl-2,C16A3.6,lin-53,exos-2,atg-5,wars-1,egl-21,mxl-1,gfm-1,got-2.1,T01G5.1,T26A8.4,spsb-2,crn-4,smg-7,rpn-5,efk-1,T08A11.1,cyp-33C9,nuo-1,Y57G11C.3,smu-1,age-1,F21D5.6,gfat-2,Y66D12A.9,mtr-4,Y71F9B.2,asg-2,T24B8.7,rsp-4,F45G2.10,prx-10,F08B4.7,rpap-2,Y75B8A.31,vps-25,sel-11,gars-1,peli-1,phy-3,D2013.6,epg-3,tgt-1,riok-3,mys-2,smu-2,C37C3.18,F25H8.1,Y69A2AR.18,bud-31,F53E10.6,zhp-3,hpk-1,K07A12.1,pole-2,eef-1B.2,sel-5,jmjd-5,pyk-1,F52C6.3,hda-4,cand-1,phf-30,dph-3,tofu-6,mrpl-51,B0491.7,dlk-1,natb-1,bath-42,srp-6,rad-51,agt-1,drsh-1,kars-1,M18.6,gdh-1,mel-32,Y71H2AM.3,K11H12.1,wah-1,F37A4.2,cyn-3,ech-6,ari-1.3,ZC434.4,T20D4.3,R04F11.3,set-10,C15C6.4
#interactive plot highlighting significant terms
gostplot(gostres, 
         capped = TRUE, # all pvalues less than 10e-16 will appear above this line together
         interactive = TRUE)
#create a table with results
publish_gosttable(gostres, highlight_terms = gostres$result[c(1:10),],
                   use_colors = TRUE, 
                   show_columns = c("source", "term_name", "term_size", "intersection_size"),
                   filename = NULL)
## The input 'highlight_terms' is a data.frame. The column 'term_id' will be used.

gostres$result[,c("term_id", "p_value")]
##         term_id      p_value
## 1    GO:0009057 1.205155e-07
## 2    GO:0008150 4.284938e-07
## 3    GO:0044265 1.049533e-06
## 4    GO:0008152 1.049533e-06
## 5    GO:0006725 1.049533e-06
## 6    GO:0044237 1.049533e-06
## 7    GO:1901360 1.049533e-06
## 8    GO:0006139 1.259234e-06
## 9    GO:0016071 1.259234e-06
## 10   GO:0046483 2.247224e-06
## 11   GO:0006396 1.046046e-05
## 12   GO:0044238 1.162555e-05
## 13   GO:0009987 1.845521e-05
## 14   GO:0043170 1.845521e-05
## 15   GO:0006807 1.845521e-05
## 16   GO:0071704 1.845521e-05
## 17   GO:0034641 2.034768e-05
## 18   GO:1901575 3.224208e-05
## 19   GO:0030163 3.481456e-05
## 20   GO:0009056 3.481456e-05
## 21   GO:0090304 3.681773e-05
## 22   GO:1901565 5.972754e-05
## 23   GO:0051603 6.328325e-05
## 24   GO:0006397 1.354888e-04
## 25   GO:0044257 2.822246e-04
## 26   GO:0006508 4.053842e-04
## 27   GO:0044248 4.053842e-04
## 28   GO:0008380 7.322136e-04
## 29   GO:0034660 7.322136e-04
## 30   GO:0000375 7.955584e-04
## 31   GO:0000377 7.955584e-04
## 32   GO:0000398 7.955584e-04
## 33   GO:0006399 9.582505e-04
## 34   GO:0043632 1.094595e-03
## 35   GO:0098813 2.177628e-03
## 36   GO:0010498 2.627028e-03
## 37   GO:0022613 2.837748e-03
## 38   GO:0071840 4.129183e-03
## 39   GO:0008298 4.668155e-03
## 40   GO:0019941 5.058237e-03
## 41   GO:0034470 5.397238e-03
## 42   GO:0006091 5.649420e-03
## 43   GO:0043161 5.649420e-03
## 44   GO:0002097 6.394963e-03
## 45   GO:0000280 6.401481e-03
## 46   GO:0016070 6.813825e-03
## 47   GO:0006259 7.701393e-03
## 48   GO:0006099 8.025912e-03
## 49   GO:0051704 8.223092e-03
## 50   GO:0002098 8.554580e-03
## 51   GO:0007059 9.014150e-03
## 52   GO:0006950 9.627332e-03
## 53   GO:0022618 9.627332e-03
## 54   GO:0051276 9.627332e-03
## 55   GO:0000956 9.627332e-03
## 56   GO:0048285 1.058706e-02
## 57   GO:0000819 1.138133e-02
## 58   GO:0044085 1.188929e-02
## 59   GO:0006402 1.391647e-02
## 60   GO:0071826 1.397883e-02
## 61   GO:0006511 1.413378e-02
## 62   GO:0071033 1.536505e-02
## 63   GO:0006448 1.691627e-02
## 64   GO:0043933 1.900789e-02
## 65   GO:0051321 1.900789e-02
## 66   GO:0044260 2.051974e-02
## 67   GO:0043648 2.111246e-02
## 68   GO:0007292 2.130892e-02
## 69   GO:0019439 2.130892e-02
## 70   GO:0090068 2.253772e-02
## 71   GO:0016043 2.253772e-02
## 72   GO:0006974 2.311367e-02
## 73   GO:0006401 2.507517e-02
## 74   GO:0010467 2.967063e-02
## 75   GO:1901361 3.235640e-02
## 76   GO:0055086 3.708367e-02
## 77   GO:0008033 3.717589e-02
## 78   GO:0071028 4.292659e-02
## 79   GO:0071027 4.292659e-02
## 80   GO:0000460 4.342664e-02
## 81   GO:0005622 1.185705e-13
## 82   GO:0043229 2.075153e-12
## 83   GO:0005634 2.216125e-12
## 84   GO:0043226 1.095754e-10
## 85   GO:0043231 5.858941e-10
## 86   GO:0032991 5.858941e-10
## 87   GO:0005575 1.578871e-09
## 88   GO:0043227 4.458230e-09
## 89   GO:0110165 3.106655e-08
## 90   GO:1902494 4.865466e-06
## 91   GO:0031981 7.639010e-06
## 92   GO:0031974 3.514672e-05
## 93   GO:0043233 4.639301e-05
## 94   GO:0070013 5.545420e-05
## 95   GO:0005681 2.052904e-04
## 96   GO:0043228 9.339241e-04
## 97   GO:0043232 9.339241e-04
## 98   GO:0005654 1.354860e-03
## 99   GO:0005737 1.469126e-03
## 100  GO:0005694 4.275547e-03
## 101  GO:0098687 6.028327e-03
## 102  GO:0071013 6.028327e-03
## 103  GO:0099080 6.263476e-03
## 104  GO:0005657 7.257216e-03
## 105  GO:0031967 1.119636e-02
## 106  GO:0031975 1.119636e-02
## 107  GO:0043596 1.782733e-02
## 108  GO:0071014 1.782733e-02
## 109  GO:0000775 2.037388e-02
## 110  GO:0000974 2.454604e-02
## 111  GO:0097526 2.480865e-02
## 112  GO:1990904 2.480865e-02
## 113  GO:0031261 3.116630e-02
## 114  GO:0031966 3.643344e-02
## 115  GO:0000776 3.669546e-02
## 116  GO:0005604 3.669546e-02
## 117  GO:0005839 3.675051e-02
## 118  GO:0005740 3.818404e-02
## 119  GO:0031012 3.818404e-02
## 120  GO:0120114 4.714118e-02
## 121  GO:0032993 4.896926e-02
## 122  GO:0003674 4.163542e-05
## 123  GO:0005488 9.574305e-04
## 124  GO:0003824 1.461901e-03
## 125  GO:0005515 3.493135e-03
## 126  GO:0046983 1.147517e-02
## 127  GO:0016787 1.246257e-02
## 128  GO:0004175 2.445643e-02
## 129  GO:0008233 2.445643e-02
## 130  GO:0070011 2.445643e-02
## 131  GO:0042803 3.909702e-02
## 132  GO:0042802 3.909702e-02
## 133  GO:0015399 4.468810e-02
## 134  KEGG:00000 1.280849e-10
## 135  KEGG:03040 1.268123e-04
## 136  KEGG:00020 1.335073e-04
## 137  KEGG:01200 1.055262e-03
## 138  KEGG:01210 8.074470e-03
## 139  KEGG:03015 2.992449e-02
## 140  KEGG:01100 3.018273e-02
## 141  KEGG:04120 4.921189e-02
## 142  KEGG:04142 4.921189e-02
## 143   TF:M00000 8.639100e-10
## 144   TF:M08052 2.285678e-06
## 145 TF:M08052_0 2.285678e-06
## 146 TF:M08052_1 2.319846e-06
## 147   WP:WP1411 1.866870e-02
## 148   WP:000000 1.866870e-02
#save results for GO terms only so they can be uploaded to REVIGO
t<- gostres$result[,c("term_id", "p_value")]
write.table(t[grep("GO", t$term_id),], "results/gProfiler_GO.txt", quote=FALSE, row.names = FALSE, col.names = FALSE)

#Then copy and pase the content of gProfiler_GO.txt into REVIGO at: http://revigo.irb.hr/revigo.jsp
#and download the R script for plotting from that website. REVIGO reduces the GO terms by removng redundancy and making it easier to look at the results.
#Can download the treemap script or the other R scripts from the individual tabs
#NOTE: The treemap R script needs a small modification to avoid warnings (change tmPlot() to treemap())

Visualisation of gProfiler GO analysis with REVIGO plots:

Alt text

Alt text

Session information

date()
## [1] "Sun Jul  5 11:46:47 2020"
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
##  [1] grid      parallel  stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] gprofiler2_0.1.9            treemap_2.4-2              
##  [3] gdtools_0.2.2               PCAtools_1.2.0             
##  [5] lattice_0.20-41             reshape2_1.4.4             
##  [7] apeglm_1.8.0                DESeq2_1.26.0              
##  [9] SummarizedExperiment_1.16.1 DelayedArray_0.12.3        
## [11] BiocParallel_1.20.1         matrixStats_0.56.0         
## [13] pathview_1.26.0             org.Hs.eg.db_3.10.0        
## [15] cowplot_1.0.0               ggupset_0.3.0              
## [17] enrichplot_1.6.1            svglite_1.2.3              
## [19] clusterProfiler_3.14.3      pheatmap_1.0.12            
## [21] EnhancedVolcano_1.4.0       ggrepel_0.8.2              
## [23] Rgraphviz_2.30.0            topGO_2.38.1               
## [25] SparseM_1.78                graph_1.64.0               
## [27] org.Ce.eg.db_3.10.1         GO.db_3.10.0               
## [29] goseq_1.38.0                geneLenDataBase_1.22.0     
## [31] BiasedUrn_1.07              reactome.db_1.70.0         
## [33] AnnotationDbi_1.48.0        Biobase_2.46.0             
## [35] biomaRt_2.42.1              ggplot2_3.3.2              
## [37] dplyr_1.0.0                 tximport_1.14.2            
## [39] sleuth_0.30.0               tidyr_1.1.0                
## [41] rtracklayer_1.46.0          GenomicRanges_1.38.0       
## [43] GenomeInfoDb_1.22.1         IRanges_2.20.2             
## [45] S4Vectors_0.24.4            BiocGenerics_0.32.0        
## 
## loaded via a namespace (and not attached):
##   [1] rappdirs_0.3.1           coda_0.19-3              acepack_1.4.1           
##   [4] bit64_0.9-7              knitr_1.29               irlba_2.3.3             
##   [7] data.table_1.12.8        rpart_4.1-15             KEGGREST_1.26.1         
##  [10] RCurl_1.98-1.2           generics_0.0.2           GenomicFeatures_1.38.2  
##  [13] RSQLite_2.2.0            europepmc_0.4            bit_1.1-15.2            
##  [16] xml2_1.3.2               httpuv_1.5.4             assertthat_0.2.1        
##  [19] viridis_0.5.1            xfun_0.15                hms_0.5.3               
##  [22] evaluate_0.14            promises_1.1.1           fansi_0.4.1             
##  [25] progress_1.2.2           dbplyr_1.4.4             igraph_1.2.5            
##  [28] DBI_1.1.0                geneplotter_1.64.0       htmlwidgets_1.5.1       
##  [31] purrr_0.3.4              ellipsis_0.3.1           crosstalk_1.1.0.1       
##  [34] backports_1.1.8          annotate_1.64.0          gridBase_0.4-7          
##  [37] vctrs_0.3.1              withr_2.2.0              ggforce_0.3.2           
##  [40] triebeard_0.3.0          bdsmatrix_1.3-4          checkmate_2.0.0         
##  [43] GenomicAlignments_1.22.1 prettyunits_1.1.1        cluster_2.1.0           
##  [46] DOSE_3.12.0              lazyeval_0.2.2           crayon_1.3.4            
##  [49] genefilter_1.68.0        pkgconfig_2.0.3          labeling_0.3            
##  [52] tweenr_1.0.1             nlme_3.1-148             nnet_7.3-14             
##  [55] rlang_0.4.6              lifecycle_0.2.0          BiocFileCache_1.10.2    
##  [58] rsvd_1.0.3               polyclip_1.10-0          Matrix_1.2-18           
##  [61] urltools_1.7.3           Rhdf5lib_1.8.0           base64enc_0.1-3         
##  [64] ggridges_0.5.2           png_0.1-7                viridisLite_0.3.0       
##  [67] bitops_1.0-6             Biostrings_2.54.0        blob_1.2.1              
##  [70] DelayedMatrixStats_1.8.0 stringr_1.4.0            qvalue_2.18.0           
##  [73] jpeg_0.1-8.1             gridGraphics_0.5-0       scales_1.1.1            
##  [76] memoise_1.1.0            magrittr_1.5             plyr_1.8.6              
##  [79] zlibbioc_1.32.0          compiler_3.6.3           dqrng_0.2.1             
##  [82] bbmle_1.0.23.1           RColorBrewer_1.1-2       KEGGgraph_1.46.0        
##  [85] Rsamtools_2.2.3          cli_2.0.2                XVector_0.26.0          
##  [88] htmlTable_2.0.0          Formula_1.2-3            MASS_7.3-51.6           
##  [91] mgcv_1.8-31              tidyselect_1.1.0         stringi_1.4.6           
##  [94] emdbook_1.3.12           yaml_2.2.1               GOSemSim_2.12.1         
##  [97] BiocSingular_1.2.2       askpass_1.1              locfit_1.5-9.4          
## [100] latticeExtra_0.6-29      fastmatch_1.1-0          tools_3.6.3             
## [103] rstudioapi_0.11          foreign_0.8-75           gridExtra_2.3           
## [106] farver_2.0.3             ggraph_2.0.3             digest_0.6.25           
## [109] rvcheck_0.1.8            BiocManager_1.30.10      shiny_1.5.0             
## [112] Rcpp_1.0.4               later_1.1.0.1            httr_1.4.1              
## [115] colorspace_1.4-1         XML_3.99-0.3             splines_3.6.3           
## [118] graphlayouts_0.7.0       ggplotify_0.0.5          plotly_4.9.2.1          
## [121] systemfonts_0.2.3        xtable_1.8-4             jsonlite_1.7.0          
## [124] tidygraph_1.2.0          R6_2.4.1                 Hmisc_4.4-0             
## [127] pillar_1.4.4             htmltools_0.5.0          mime_0.9                
## [130] glue_1.4.1               fastmap_1.0.1            fgsea_1.12.0            
## [133] mvtnorm_1.1-1            utf8_1.1.4               tibble_3.0.1            
## [136] numDeriv_2016.8-1.1      curl_4.3                 openssl_1.4.2           
## [139] survival_3.2-3           aggregation_1.0.1        rmarkdown_2.3           
## [142] munsell_0.5.0            DO.db_2.9                rhdf5_2.30.1            
## [145] GenomeInfoDbData_1.2.2   gtable_0.3.0
#save.image(file="neurexin_analysis.Rdata")